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ABSTRACT 


Data is the foundation of all knowledge; it is integral to our daily lives. Raw, unprocessed data forms 
the basis of scientific discoveries and technological advancements. Insights derived from today's data 
will inform decisions that impact the future. As technological improvements drive the proliferation of 
space big data, applications are being developed to provide real-time analytics to assist the growth of 
businesses, optimize crop yields for farmers, and provide key information to scientists attempting to 
unravel the mysteries of the universe. 


This report provides a definition of space big data and presents a comprehensive analysis of the 
activities of stakeholders within the space big data value chain. After this preliminary identification, we 
identify the key challenges faced by each of these stakeholders in their efforts to harness the 
exponential growth of space data. We studied these challenges and derived four overarching 
challenges that, if solved, would allow us to take full advantage of the value from space big data. We 
propose solutions to these four challengers, touching upon engineering, science, law and policy, and 
human performance in space. Finally, we have assembled a road-map based on these solutions, which 
sets out a series of actionable recommendations to achieve the goal of maximizing the potential of 
space big data. 


The timing to start on the road ahead is just right as the emerging space big data domain is at a turning 
point. Coordinated efforts are now needed by industry players to jump the barrier together to ensure 
the potential of space big data is maximized and the industry continues to grow rapidly. 


FACULTY PREFACE 


Any way you look at it, data is the driving force behind space industry and space research. Imagery, 
electromagnetic signals, communication, satellite telemetry, sensor data, observation logs, and 
manned space mission studies all produce enormous amounts of information that must be examined 
again and again to understand more about our universe, our solar system, and our planet. 


While data needs and collection are key requirements of the space community, it is regarded as a 
byproduct and, typically, a problem that needs to be solved to allow the project to take place. This way 
of thinking results in ad-hoc, per-project local solutions, which are solved separately over and over 
again for each project. Simply put, data is not being disseminated nor shared enough, which results in 
underutilized data and the enormous resources invested into its acquisition with little return. 


This Space Big Data Team-Project (TP) aims to tackle major big data challenges. The TP tracks the data 
flow from its creators, to its repositories, all the way to its use for research and applications. Identifying 
few representative data flow paths, the TP presents ways to improve this process. 


The team itself consists of 35 excellent participants, coming from 15 different countries. The team 
represents a unique variety of professions in the space community and the International Space 
University (ISU) knowledge pillars. This includes engineers, policy makers, research scholars, legal 
practitioners, and scientists. This diverse cohort of professions is enriched by the participants’ spread of 
nationalities, allowing the group to address the complex problem of space big data in a comprehensive 
way, which makes this report so unique in scope. 


Specifically, the project surveys the use and reuse of space big data, mapping the players and 
stakeholders in the knowledge flow process from data acquisition and its transmission, through to data 
processing, data storage and data-based applications, and utilities to data dissemination. The survey 
allows for a detailed study of the benefits, hurdles, and the different aspects (research, social, legal, 
technical) of managing and accessing space-related data. 


Following this study, the project drafts a roadmap and recommends strategic actions for 
implementation by the space community, in general, and data providers - agencies, universities, 
companies, research institutes - in particular. The aim of this roadmap is to improve the knowledge 
flow process and essentially extract the most value out of space big data. 


Metaphorically speaking, the essence of this project and report is to give the opening serve, in the long 
tennis game between the big data players. We hope that the concepts presented here will specifically 
emphasize the need for addressing explicit data-related aspects of future space programs, and will lay 
the path and the basic principles for how data is treated in the space community. 


Barak Fishbain 
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TEAM PREFACE 


“Knowledge of the past and of the places of the Earth is the ornament and food of the mind of man.” 
— Leonardo Da Vinci 
This final report is a tangible analogy to the process of mapping and solving the challenges facing space 
big data (SBD). Over the course of four weeks, 35 people from 15 different countries, speaking 15 
different languages, were placed in a room and asked to write about space big data. We were 
instructed to produce actionable suggestions based on an intensive review of literature and case 
studies. 


Space big data is an incredibly interdisciplinary topic; it lies at the intersection of the global space 
enterprise and the burgeoning big data market. After defining the nebulous term “space big data,” we 
worked together to gather information, process it in a way that enabled us to extract value from the 
research, and apply our newfound knowledge to solving the issue at hand. We found that getting to the 
peak of the space big data mountain was a Herculean effort. 


Each member of the SBD team brought their own perspective from their respective fields of expertise 
as well as from their backgrounds and cultures. Throughout the project, each team member was faced 
with trying situations, but whether it was communication, technical challenges, or something else, we 
found a path to success together. Our intercultural, international, and interdisciplinary team discovered 
that a lot of smart people sitting together in a single room can come up with some pretty good ideas, 
but it takes a lot of effort to piece it all together, much like the problem of space big data. 


However, the more we studied the problem, the easier it became to deal with it. It really helped to split 
up the key players in the SBD value chain and to identify common overarching challenges for the 
purpose of outlining a comprehensive roadmap. The roadmaps are sometimes utopian and others are 
more feasible to accomplish today. We hope that, if nothing else, this report brings awareness: 
awareness about space big data, its players and its challenges. 


This report represents an important milestone in the journey toward understanding and effectively 
using space big data. The space big data industry is at a turning point and the players in the emerging 
market are gearing up to jump the barrier and embark on the journey ahead. It is incredibly rewarding 
to have worked as a team to construct a figurative launch pad by mapping the players and challenges, 
and provide recommendations for the voyage ahead. We are hopeful that we will see a joint effort 
from government, academic, and industry players in the months and years to come toward a collective 
effort to launch the space big data era. 
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1. Introduction 


Data is the main product generated in the space sector. We send satellites into space to gather data 
about the Earth and the universe (remote sensing and astronomical observation), to transfer data from 
one place to another (communication), and to support location-based services (navigation). 
Additionally, data obtained about Human Performance in Space (HPS) provides the lifeline for human 
spaceflight and exploration. Data is meaningless, however, if it is not converted into information 
through processing and analysis (Bellinger, Castro, Mills, 2004). 


Over the last decade, the concept of “big data” has taken the world by storm through its impact in 
generating meaningful insights for the internet industry. Companies like Google and Facebook have 
invested heavily in the development of innovative solutions to organize and utilize massive real-time 
data (Monnappa, 2015). Nevertheless, the question of what big data addresses is a topic of hot debate 
within the community, and penning a unified definition remains an open problem (station10, 2015). 


Space big data (SBD) is an important subset of the big data universe. It can be defined as “massive 
spatio-temporal Earth and space observation data collected by space-borne and ground based sensors” 
(European Commission, 2016; ESA, 2016), but this definition is not universally accepted. SBD can also 
be defined in terms of a set of characteristics: volume, velocity, variety, veracity, and variability (Marr, 
2016; Big Data Week Perth, 2016). These 5Vs provide insight into the main factors that determine the 
challenges and opportunities for SBD. Since the field is in its infancy and further research is required to 
fully grasp its nature, for the purposes of this report, we define SBD as: a large variety (e.g., images, 
text, log files) and huge quantity of rapidly generated, unstructured space data. 


SBD matters because, with the use of suitable methods and technologies, we can extract valuable 
information to assist with global issues such as: agriculture; forestry and fisheries; biodiversity and 
environmental protection; civil protection and humanitarian aid; climate and energy; public health; 
tourism; transport and safety; and urban and regional planning (Copernicus, 2016a). For instance, 
accurate weather forecasting using space data during 2012’s superstorm Sandy reduced the level of the 
disaster on human lives considerably (AIA, 2012). Satellite imaging data has also significantly improved 
understanding of ocean salinity, which is a key component for global climate models (AIA, n.d.). 


The SBD sector is growing rapidly, which makes it essential to immediately assess its potential impact 
on the world. For instance: on a weekly basis, NASA’s Square Kilometer Array (SKA) project will 
generate more data than has been produced throughout human history (NASA, 2013; Gilliland, 2015). 
ESA’s Sentinel-2A (an Earth Observation —EO -satellite) acquires a massive 1.6 terabytes of data per day 
(Armbruster, 2016). Moreover, from the commercial perspective, a survey undertaken by Transparency 
Market Research (2016) states that the turnover of the global commercial satellite imaging market was 
about USS2.50 billion in 2014 and is expected to grow at 11.4% per year, reaching USS6.48 billion by 
2023. These figures demonstrate and forecast that space big data with respect to Earth observation, 
which is a large contributor to space big data, will continue to grow. 


Challenges that place constraints on SBD (transport, storage, retrieval, sharing) limit the entire industry. 
SBD also raises a number of challenges related to data generation, transmission, processing, analysis, 
management, and integration. If we manage to address these problems, we overcome the hurdles that 
limit growth, and we provide new means and opportunities for the space sector to develop in ways that 
are currently not possible. 


The aim of this report is to shed light on the landscape of SBD, provide insight into the key challenges 
for further growth, and lay the foundation for a roadmap to surmount barriers. In this chapter, Section 
1.1 outlines the mission statement for our project and provides a succinct summary of our primary 
aims and objectives. In Section 1.2, we summarize the scope of the project and the main constraints. 
Finally, in Section 1.3, we outline our report structure. 


1.1. Mission Statement and Objectives 


To define space big data, map the activities of stakeholders, identify challenges they face, 
suggest potential solutions, and outline recommendations to promote growth of the space 
big data sector. 


This report catalogs the results from our investigation into the fundamental nature of SBD. Our mission 
statement establishes the foundation for the project and leads to a number of primary objectives: 


e To propose a unified definition for SBD 

e@ To generate a comprehensive overview of the activities of stakeholders involved in SBD 
e To identify key challenges that the stakeholders currently face 

e@ To analyze the challenges and establish core challenges faced by the industry as a whole 
e To generate potential solutions to the core challenges 


@ To propose a series of recommendations to overcome barriers and promote growth of SBD 


1.2. Scope and Constraints 


This scope of this project was defined primarily by the mission statement given in Section 1.1. We 
conducted our study under the umbrella of the International Space University’s (ISU) Space Studies 
Program (SSP) 2016, which brought together 35 individuals from 15 different countries with a unique 
blend of interdisciplinary expertise within the space industry, including engineers, law and policy 
makers, and space scientists, to name a few. The project was executed within a period of four weeks, 
placing strong constraints on the outcome of our investigation. 


Our research was focused on developing a comprehensive understanding of the definition of SBD. 
Although we touched upon all data products, we primarily investigated the EO sector. We studied the 
activities of each stakeholder involved in SBD and identified their key challenges. We analyzed these 
challenges to enable us to determine the core challenges that require coordination across the whole 
sector. We also generated a cursory overview of potential solutions to these core challenges, and 


developed an initial set of recommendations. These solutions and recommendations could serve as a 
starting point for a dedicated follow-up study. 


1.3. Report Structure 


This report is structured as follows. Chapter 2 provides background to SBD, including the definition, 
brief history, value chain analysis and an overview of end users. In Chapter 3, we detail the main 
activities and challenges faced by data manufacturers. In Chapter 4, we discuss data processors. 
Chapter 5 describes SBD applications. Chapter 6 discusses the role and challenges for data repositories. 
The outcome of our in-depth analysis into the activities of these key stakeholders is summarized in 
Chapter 7, where we list core challenges for SBD that were distilled by considering the sector as a 
whole. Chapter 8 follows on from the core challenges and provides a cursory analysis of 
recommendations to reduce barriers and promote growth of SBD. Finally, we provide concluding 
remarks in Chapter 9. 


2. Space Big Data 


SBD is transforming the world we live in. Leading experts look to SBD as a source of exponential growth 
in the space industry, which will serve as a driving force for new commercial opportunities, an incentive 
to stimulate new ways of thinking about the universe and our place in it, and a spearhead for the 
development of disruptive products and services. To maximize our ability to harness this potential, it is 
important to clearly define SBD and study its benefits to society. 


A discussion about the definition of SBD is presented in Section 2.1. Subsequently, we provide a brief 
history of SBD in Section 2.2. Section 2.3 includes an overview of the SBD value chain. Finally, in Section 
2.4 we survey the key activities and challenges in relation to end users. 


2.1. Definition 


The space industry, which includes any activity that is related to or executed in space, is witnessing 
rapid growth. In part, this growth results from the emergence of extensive Low Earth Orbit (LEO) 
constellations; High-Throughput Satellites (HTS); the Global Navigation Space Systems (GNSS); 
increasingly capable astronomical observatories; and cutting-edge space exploration and human 
missions (Palerm-Serra, 2015). 


Space data is a booming area within the space industry, leading to new opportunities and challenges. 

The Square Kilometer Array (SKA), for instance, will generate 700 terabytes of data per second at full 

capacity (NASA, 2013), and this is equal to the storage capacity of over 3 billion iPhones each day! We 
define space data as follows (Marchetti, and Soille, 2016): 


Space data encompasses all data gathered through activities that utilize space assets 


These space assets include ground-based and space-based astronomical observatories, 
communications satellites, navigation satellites, and Earth observation platforms. The question of how 
massive and rapidly changing space data could be utilized for commercial, scientific, governmental, and 
consumer applications has led to excitement that permeates the space industry. 


SBD is located at the intersection of two worlds: the space industry and the big data universe. To 
understand SBD, we must analyze both of these sectors carefully. Defining SBD is challenging, since 
there is no unified view of what big data represents, despite the concept being ubiquitous (Ward and 
Barker, 2013). A number of definitions for big data can be found in the literature. A Gartner report in 
2001 proposed the '3Vs' model, which states that “Big Data is the Information asset characterized by 
such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its 
transformation into Value.” (De Mauro et al., 2016). IBM’s Big Data & Analytics Hub states a newer 4Vs 
model, from which the fifth V for value can be extracted (IBM Big Data & Analytics Hub, 2016). 
Recently, the 5Vs model was proposed, which describes big data in terms of the following set of 


characteristics: volume, velocity, variety, veracity, and variability (Marr, 2016; Big Data Week Perth, 
2016). Our definition of SBD takes these five characteristics into account. 


Figure 1 illustrates the concept that SBD stems from an intersection of big data and the space industry, 
which gives rise to a wide spectrum of disruptive applications. For instance, SBD enables investment 
managers to create powerful analysis and forecasting tools to reduce risk and increase returns 
(Catapult, 2016). Farmers can optimize resource allocation through cutting-edge analysis of high 
precision, high resolution data, enabling them to gain unprecedented insights into their markets (ISI, 
2016). SBD also helps us to understand our universe. The Gaia mission (Lindegren et al., 2007) is 
identifying and tracking more than a billion stars, enabling scientists to develop deeper insights that 
inform our understanding of who we are and where we come from. These examples represent the tip 
of the iceberg, and speak to the potential transformative impact of SBD on society. 


Industry 





Figure 1 - Illustration of Space Big Data as the Intersection of the Space Industry and the Big Data 
Universe 


Building on definitions provided in the literature and our analysis, we propose a unified definition for 
SBD: 


Space big data is defined as a large variety (e.g., images, text, log files) and huge 
quantity of (unstructured) space data that is generated rapidly 


2.2. Brief History 


To understand the current SBD sector, it is important to be aware of the history of both big data and 
the space data sectors. It is difficult to pin down a moment in history when the big data industry 
started; however, some consider the efforts to collate census data in the 19th century to be pivotal, 
e.g., the US 1890 census (Wheatley, 2012). A number of authors have attempted to catalog the history 
of big data over the last century (Press, 2013; Barnes, 2013). There remains a degree of uncertainty 
about the timeline, given the fact that the notion of big data is amorphous. The modern perspective on 
big data is considered to have been formulated between 1997 and 2001, through a series of 
publications (Wheatley, 2012). 


The space sector is at the forefront of the big data discussion. Astronomers and engineers have often 
led the way in developing advanced techniques to compile, handle, and analyze large data sets. For 
instance, at the beginning of the 20th century, astronomers used data from tens of thousands of stars 


to establish a relationship between luminosity and temperature, which still forms the backbone of our 
understanding of stellar evolution (Hertzsprung, 1908; 1911). During World War 2, British codebreakers 
(often women) used mechanical computers to process and analyze massive quantities of intercepted 
messages. This effort played a pivotal role in ending the war (Budiansky, 2002). After the war, women 
were employed in key roles to analyze large quantities of data for aviation and missile development, as 
well as spacecraft orbit determination (Light, 2016). 


During the 1950s, computers started to become more powerful and were introduced for data 
processing and reduction. This movement was exemplified by the establishment of the World Data 
Centers (WDC), which were proposed during the International Geophysical Year (IGY) (1957-1958). The 
WDCs greatly improved our management and archiving capacity for large scientific datasets (Aronova, 
Baker and Oreskes, 2010; Korsmo, 2010). 


Since the 1950s, there have been rapid strides forward in the space industry, which have led to the 
development of advanced systems for big data management, storage, processing, and dissemination. 
Nevertheless, the explosion of internet data over the last decade means that the space industry has an 
opportunity to make use of other cutting-edge infrastructure, tools, and methodologies. 


The SBD sector is at a turning point, with anticipated widespread, exponential growth within the next 
few years. Understanding how to capture this growth and extract the greatest economic and social 
value requires in-depth analysis of the key stakeholders and their activities. Figure 2 illustrates this idea 
by capturing the current state of the general big data industry compared to the SBD sector. 





(a) Current state of big data (b) Current state of space big data 


Figure 2 - Schematic Illustration of Current State of the (a) Big Data Universe compared to the (b) Space 
Big Data Sector 


As we are on the cusp of the age of SBD, our project is part of the pioneering work being undertaken to 
define and survey the challenges and opportunities within this field. Key players in the space industry 
are putting significant effort into charting the characteristics of big data for space, and we are seeing 
these efforts come to fruition through applications created by companies like Orbital Insight, 


RSMetrics, Descartes Labs, and SpaceKnow. In addition, creating customer awareness will be a major 
expense, as enterprises scale up and markets evolve (Northern Sky Research, 2016). 


2.3. Value Chain 


Value chain, or life cycle analysis, is based on the idea of surveying a series of activities that create and 
build value that is eventually delivered to customers (Porter, 2008). Various published studies discuss 
the value chain for big data. Rayport and Sviokla (1995) were among the first who applied the value 
chain definition to big data in their work on virtual value chains: “the flow of information undergoing a 
series of transformations from raw collected data to the meaningful information can be defined as the 
big data life cycle” (Curry, 2016). Miller and Mork (2013) also provide insight into the value chain for big 
data by developing a framework around data discovery, data integration, and data exploitation. 


Studying the SBD value chain provides a means to understand the value-adding activities within the 
sector and support the development of strategies to extract the aggregate value. As the technology and 
industry surrounding SBD has evolved and matured, so has the challenge of extracting meaningful 
insights out of continuously produced raw data. Consequently, the ability to manage data and produce 
value is now a key competitive advantage. We investigated activities of the following principal 
stakeholders, to map the value generated at different stages in the SBD value chain: 


e Data manufacturers 
e Data processors 
e Data applications 
e Data repositories 
In the following, Edward Curry (2016) briefly summarize the main characteristics of these stakeholders: 


e Data manufacturing refers to the “process of generating, gathering, filtering, and cleaning data 
before it is put in a data warehouse or any other storage solution where data analysis can be 
carried out” (Curry, 2016). The raw data collected often does not have an applicable use, but 
must flow on to the data processors to generate meaningful results. 

e Data processors are concerned with analyzing and understanding “raw data acquired to use in 
decision-making as well as domain-specific usage. It involves exploring, transforming, and 
modeling data with the goal of highlighting relevant data, and then synthesizing and extracting 
useful hidden information with high potential from a business point of view. Related areas 
include data mining, business intelligence, and machine learning” (Curry, 2016). 

e Data applications refer to data-driven activities that analyze and integrate data analysis within 
a given activity. “In business decision-making, [data applications] can enhance competitiveness 
through reduction of costs, increased added value, or any other parameter that can be 
measured against existing performance criteria” (Curry, 2016). 

e Data repositories are the multitude of data storage and management systems. Repositories 
must be “scalable” to meet “the needs of applications that require fast access to the data”. 
Therefore, data deposits must also be accessible in some way (Curry, 2016). 


Each stakeholder is required to interact with different levels of processed data. Table 1 summarizes the 
definitions of standard processing levels belonging to the data life cycle. 


Table 1 - Definition of Data Processing Levels (Sarma and Ato, 2014) 


Reconstructed, unprocessed instrument/payload data at full resolution; any and all 
communications artifacts - e.g., synchronization frames, communications headers, 
duplicate data removed. 


la Reconstructed, unprocessed instrument data at full resolution, time-referenced, and 
annotated with ancillary information, including radiometric and geometric calibration 
coefficients and geo-referencing parameters. 


[level | 
Level 1A data that have been processed to sensor units (not all instruments have Level 


1B data products). 


Derived geophysical variables at the same resolution and location as the Level 1 source 
data. 


Variables mapped on uniform space-time grids, usually with some completeness and 
consistency. 


Model output or results from analyses of lower level data, e.g., variables derived from 
multiple measurements. 





End users, as principal beneficiaries of the value extracted from SBD, are really the cause of the value 
chain as they define the need for data and information. End users include people and organizations 
from different industrial sectors (private and public) that leverage big data technology and services to 
their advantage (Curry, 2016). As end users experience complex interactions with all the stakeholders, 
we treat their activities and their challenges separately. This allows us to understand stakeholders in 
terms of the industry as a whole. We introduce end users in Section 2.4, prior to delving into the profile 
of each main stakeholder. Subsequent chapters will outline the activities and challenges faced by each 
of the aforementioned stakeholders. 


2.4. End Users 


In the big data era, powerful computing and analysis capabilities have given scientists and academics 
the possibility to use all the available data to derive more reliable conclusions within a shorter time 
span. The gap between SBD and end users is narrowing, as institutions and commercial entities 
continue to open up access to space data. Application development based on SBD is also becoming an 
increasingly bigger part of educating the next generation. 


The SBD value chain ends with the applications segment, which is employed by end users, typically in 
the form of information services. In this section, we provide several examples of how different types of 
end users are applying SBD for specific use cases. We also provide an overview of the needs of end 
users that require prioritization. 


Figure 3 below summarizes the complex interactions between end users and the main stakeholders in 
the SBD value chain. End users come from different areas of society, including the government, 
scientific domains, commercial entities, and the general public. 
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Figure 3 - Key Stakeholders and End Users in the SBD Value Chain and Their Relationships 


To satisfy the needs of the end user, each segment in the SBD value chain must identify and analyze 
their mutual interactions. As demonstrated by ESA’s Head of Data Systems Division, Philippe 
Armbruster, the importance of understanding the end users lies in the essence of their relation with 
the whole SBD value chain, as well as the importance of creating feedback links between the users and 
different parts of the chain (Armbruster, and Kearney, 2016). For instance, market trends derived from 
current end user interests are a driving force behind new developments in the application segment of 
the value chain. 


As the final segment in the value chain, end users receive the commercial value of SBD through various 
applications; a thorough understanding of this segment and its users is vital to transfer the value and 
interest of SBD to other domains. 


Governmental end users focus on public service and assist in the decision-making process to support 
the economy, disaster management, infrastructure building, public transportation, and military 
strategy. Unlike other end users, governments can directly impact all the sections of the industry 
through laws, policies, and budgets. We point out here the multiple roles of governments: they are an 
integral, active stakeholder in different segments of the value chain, and they can also be seen on the 
side of end-users, as they are using the SBD services that they developed. For example, the Chinese 
High Resolution Earth Observation System (CHEOS, 2016), is one of the 16 major projects of China’s 


“Long-term Scientific and Technological Development Plan (2006~2020)”(Gov.cn, 2006), which means 


the Chinese central government devises the plan, carries out the project, and funds it. CHEOS is meant 
to provide information services and decision support for important areas of Chinese modern 
agriculture, disaster prevention and mitigation, resources and the environment, and public safety 
(CNSA, 2014b). Clearly, the government is one of the most important end users that would use the data 
from the CHEOS system to provide services for the general public. 


After disasters and other crisis events, non-governmental organizations (NGOs) and government 
agencies are increasingly leveraging SBD to support first responders. On October 8, 2005, after an 
earthquake in the Kashmir region of Pakistan, multiple groups responded by creating maps from 
satellite imagery within 24 hours of the earthquake. Einar Bjorgo of UNOSAT said: "We receive requests 
from end users, including the UN agencies, NGOs and also government representatives, which we 
discuss with our Respond partners, then decide who produces what maps based on the needs 
expressed" (European Space Agency, 2005). 


A prominent case of governmental use of SBD in the U.S. took place at the time of Hurricane Sandy in 
2012. In Figure 4, we show the difference between the predictions of the hurricane landfall with and 
without using polar weather satellite data, and then the actual landfall. Thanks to the fact that SBD 
enabled accurate landfall prediction five days in advance, the government was able to reduce large 
financial costs and, most important, save lives of people by evacuating them in time from the coast. 
(AIA, 2012). 





Figure 4 - The Predicted Landfall of Hurricane Sandy a) Without use of Space Big Data, b) With Use of 
Space Big Data, and c) the Actual Landfall (AIA, 2012) 


Scientific end users are those who use SBD or SBD services to help prove or disprove their theories 
and/or to achieve new discoveries. These end users include scientists, researchers, students, and even 
amateur enthusiasts. Scientists and researchers occupy several roles, since they function within various 
levels of the SBD value chain. They often work with raw data to process and transform it into value- 
added data, which then becomes useful scientific output. Their needs often depend on support coming 
from subsidies allocated by their designated governments. For example, the Chinese space research 
project funded the Five-hundred-meter Aperture Spherical Telescope (FAST), also known as Tianyan, 
which will conduct stellar, galactic, and extragalactic radio astronomy (Wall, 2016). Using FAST, 
scientists will be able to conduct deep space research and are expected to develop new perceptions 
and understanding about the early days of the universe. 


The non-profit sector is often supported by the commercial sector, ensuring scientists are able to 
execute their research according to their needs and requirements. The commercial sector also 
attempts to ensure that the scientists are provided with the freedom for innovation and R&D, to create 
new opportunities in the market through better data collection and analysis. The SKA, with an objective 
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similar to Tianyan, will deliver useful SBD translated into EO, planetary, and weather forecasting data 
for commercial purposes (SKA, 2016). 


Development within the SBD industry with regards to technical progress - such as improved onboard 
data processing and infrastructure - have given the old style science a boost in coping “. .with nature’s 
complexities by seeking the underlying simplicities in the sparse data acquired by experiments’ 
(Siegfried, 2013). 


As a final example within science, we mention how space data offers potential to answer some of the 
biggest mysteries in astronomy, such as dark matter. CERN scientist Dragan Hajdukovic proposed a 
theory involving matter and antimatter with opposite gravitational charge; if his theory is proven true, 
then it would mean that the dark matter does not exist. To prove or disprove this theory, scientists 
must use SBD, and results can be expected within a couple of years (CERN, 2014). 


Commercial end users pay money for services provided by the SBD industry directly or create profit for 
the industry indirectly. In other words, commercial end users buy value-added SBD, which makes them 
the customers of the industry. Usually, they do not care what is behind the SBD services, such as where 
the data comes from, how the data is processed, and how the data is stored: they are blind to the 
activities within the SBD sector. What they care about is only if SBD can help them to solve their 
problems. They do not pay money for SBD itself; they only pay money for the value extracted from SBD. 
Since they provide a large fraction of the financial means for stakeholders in the SBD value chain, they 
are the key factor in determining the prosperity and sustainability of the SBD industry. 


These commercial end users can be divided into individual and organizational users. The individual end 
user should be acknowledged as the general user of applications such as navigation, weather forecasts, 
and traffic analysis, who directly or indirectly pay for the services. On the other hand, organizational 
end users depend on processed SBD to deliver services to their customers. These services include 
aviation navigation, agricultural service to farmers, water management, and urban planning. Two 
examples of commercial organizational end users are Orbital Insight (Orbital Insight, 2016b) and Omni 
Earth (OmniEarth, 2016). US-based Orbital Insight leverages SBD and Artificial Intelligence for analysis 
of retail traffic, monitoring global oil storage, estimating global water reserves, “estimating harvest 
yields,” and other economic activity mapping. 


DigitalGlobe Inc. (2016a) is a company that both provides and analyzes raw data for specific purposes 
and for general distribution. If a company or organization is interested in remote sensing data, then 
DigitalGlobe can generate the associated data and also process it so that it is presented in a way that is 
useful to the consumer. In 2015, Amnesty International used data gathered by DigitalGlobe (2016b) 
and processed it to understand the impact that Boko Haram had on population movement and on cities 
in Nigeria. Amnesty International is hoping to use this data to track Boko Haram’s movements in 
anticipation of where more damage will be done in the future. 


Public end users are those who enjoy free public SBD services from government public sectors. In such 
a scenario, the users of these services are the taxpayers, and they have already paid for the right to 
enjoy the services. As a result, the public has a rightful influence on governments, to meet their 
demands for more useful public application services. The example lies in the fact that space policies 
and regulations are often decided on behalf of the public; in the long term, the public is the SBD user 
that stimulates governments, commercial entities, and the space industry. 


In a broad sense, almost every one of us belongs to the category of public end users of SBD today. 
Public end users include recreational enthusiasts (e.g., fishermen, boaters, surfers), who use weather 
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data for the planning of their fishing activities (NASA Spinoff, 2009) and boating activities (SurferToday, 
2012); amateur astronomers and passionate stargazers (Voo, 2016); flight tracking application 
Flightradar24 for aviation spotters (Flightradar24, 2016); and educational applications such as Solar 
Walk (Vitotechnology, 2016). 


According to the ways in which SBD is used, end users can be divided into direct end users and indirect 
end users. Direct end users are those who directly use raw data or processed data, which means that 
they themselves have to extract valuable information using SBD software and tools. Many scientific end 
users belong to this category. For example, astronomers usually have to use data mining tools to 
analyze large astronomical repositories and surveys to achieve new discoveries. We can say that direct 
end users generally need specific professional knowledge and skills to handle SBD. On the other hand, 
indirect end users are those who use information extracted from SBD, rather than using large volumes 
of data. They do not need any professional knowledge or skills to process and analyze SBD. For 
example, Google Earth users are indirect end users. They can easily enjoy the services of Google Earth 
without knowing any details behind the application. Generally, most commercial end users are indirect 
end users. 


Furthermore, according to whether SBD is used currently, end users can be divided into current end 
users and potential end users. The SBD industry is in its infancy; to date, only a small fraction of the 
general public has had the opportunity to benefit from a small fraction of the overall value generated 
through SBD. There are still large potential needs to be fulfilled by the SBD industry, and large potential 
value to be extracted from SBD. It is the mission for all the SBD stakeholders to work together to 
convert potential end users into current end users; however, this is not just an outreach issue. The 
number of SBD end users ultimately depends on the quality and quantity of the value created by the 
SBD industry. The more value the industry creates, the more potential end users will become current 
end users. 


Having established the end user profiles, in Chapters 3 through 6 we provide in-depth analysis of the 
stakeholders identified within the SBD value chain (Section 2.3). 


12 


3. Manufacturers 


Manufacturing and generating data is the first step in the big data lifecycle. Raw data is data that has 
been collected but not yet analyzed or processed (TechTarget, 2009), so data manufacturers are among 
the most important parties involved. When identifying and describing manufacturers, it is important to 
include factors such as when data is collected, how it is collected, and who is collecting it. It is 
imperative that we first understand the core principles of data collection before moving forward to 
connect to the rest of the value chain. This data could include imaging data, non-visual measurements 
such as temperature, location, or magnitude of any geophysical parameter; or even human data from 
manned missions. 


3.1. Manufacturing Systems 


Just as there are many types of data, there are many types of systems that collect the data itself. 
Although SBD is often mistakenly thought of as solely data collected by satellites, this data only makes 
up a portion of the total volume. Much of the SBD generated today also comes from sensors and 
mechanisms on the ground at Earth observatories (Angelfire, 2016), in addition to that collected in 
space via satellites or during manned missions (Space.com, 2016). Space data manufacturing systems 
can be broken down into two main categories: ground-based and space-based systems. 


3.1.1. Ground-Based Systems 


Ground-based SBD manufacturing systems produce space related data without the need to actually be 
in space. Governments, commercial providers, academic institutions, and even individual citizens 
operate these systems. These groups of manufacturing systems include various radio and optical 
telescopes and space tracking radars. The ground-based observatories and sensors are often located in 
more rural, dry, and high altitude locations to avoid interference from other devices that produce 
signals, atmospheric interference, and atmospheric moisture that absorbs higher frequency signals 
(Chaisson, McMillan, 2002).These facilities are able to generate an immense volume of data. The 
Square Kilometer Array, for example (SKA, 2016) will produce about 700TB of data per second at full 
capacity when it becomes fully operational (JPL, 2016). This volume represents ten times the entire 
output of the Internet. (Gilliland, 2015). 





Figure 5 - Radio Telescopes of the SKA (JPL, 2016) 
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Another important product category of ground-based manufacturers is the space service-related big 
data. For instance, telecommunication and navigation services do not generate data directly onboard, 
but associated service data has great commercial value and should also be considered SBD. The next 
generation of Iridium constellation will be able to collect the location and direction signals of planes 
and send them to the ground operational centers to optimize the flight routes. Location-based services 
such as Uber collect large number of customer positions obtained from GPS. These service-related data 
are a good example of downstream value for SBD that also generates further data. 


3.1.2. Space-Based Systems 

The sensors and equipment that produce data in space make up the space-based big data providers. 
Onboard space-based sensor systems are integrated systems of all sensor payloads on a vehicle in 
space as shown in Figure 6. 


AATSR 
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Figure 6 - A Sketch of Integrated System of Sensors On-board Envisat (European Space Agency, 2016c) 


With the advancement of modern sensors and computing technologies, the ability to generate space 
data from space assets has increased dramatically (Pixelytics, 2014). The most common payloads that 
produce big data are Earth observation instruments, space telescopes, and several kinds of in-space 
experimental instruments - including many on the ISS. This data has traditionally been collected by 
governments; however, there has been a recent trend toward commercialization in the space industry 
and, accordingly, toward the privatization of manufacturers (Meyer, 2016). 


3.1.3. Earth Observation Instruments 


Earth observation satellites consist of a very diverse combination of sources (laser, radar, optical, etc.), 
with multi-temporal capacity (collected at different times), and multi-layer capacity (different spatial 
resolution or different frequency bands). Thanks to that variety and the improvement on the related 
sensor performances, the gathered data have eventually grown big and include very different formats. 
To illustrate that variety, Figure 7 below shows the distribution of typical Earth observation resolution 
in relation to transit time (McKinnon, 2015). It also stresses the continuous drift towards high 
performance spacecraft that happen to create new big data providers. 
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Figure 7 - Spatial Resolution vs. Revisit Time (McKinnon, 2015) 


Earth observation sensors can be classified as either active or passive. Active sensors emit radiation in 
the direction of the target they observe. The active sensor detects and measures the radiation reflected 
or backscattered from the target. An important distinction between the two types of sensors is that 
passive sensors only detect radiation emitted or reflected naturally by the object they observe. Sunlight 
is often the source of radiation for passive sensors, which observe the reflected light (NASA Earthdata, 
2016). Table 2 shows types of active and passive sensors with specific examples of each. The 
information obtained by these instruments is raw data or Level 1 data. 


Table 2 - Examples of Active and Passive Sensors Used as Space-Based Payloads (NASA Earthdata, 2016) 


Sensor Active/ 
Passive 
LIDAR: Light Active 


detection and 
ranging 


(Angelfire, 2016) 


Description Potential Example 
Applications 
Uses lasers totransmit Distance orrangeto DESDynl 


a light pulse and the target 
measure the 

backscattered or 

reflected light through 
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RADAR : Radio 
detection and 
ranging/Synthetic 
Aperture Radar 


Scatterometer 


Panchromatic 
camera 
Hyperspectral or 
multispectral 
radiometer, imaging 
radiometer, 
spectroradiometer 


Spectrometer 


Active 


Active 


Passive 


Passive 


passive 


3.1.4. Space-Based Observatories 


a receiver with 
sensitive detectors 
Emits microwave 
radiation in series of 
pulses. The reflected / 
backscattered energy is 
detected, measured 
and timed 
High-frequency 
microwave radars used 
to measure 
backscattered 
radiation 

Digital image 


Detects multiple 
spectral bands in the 
visible, near infrared, 
and mid infrared band 


Usually uses a prism to 
disperse and measure 
the radiation 


Distance or range to 
the target, images 


Maps of surface 
wind speed and 
direction 


Produce image 


Discrimination 
between targets, 
intensity of 
electromagnetic 
radiation in 
specified 
wavelengths 
Spectral content of 
electromagnetic 
radiation 


DESDynl 


Oceansat-2 


SPOT 7 


VHRR and 
onboard NOAA 
and Metop 
satellite 


Troposphere 
Emission 
Spectrometer 


Space-based observatories are research infrastructures that provide scientific observations of the 
universe, from the solar system to the cosmological background. Space observatories may have space 
telescopes, radar, and other imaging instruments physically in space to observe distant planets, 
galaxies, and other astronomical objects. Placing an observatory in space offers the unique advantage 
of a more stable environment with virtually no perturbations due to Earth’s atmosphere (Melina, 
2010). Space-based observatories offer the possibility of longer observation periods than available from 
the ground, where observation are hindered by day-night cycles, seasonal interruptions, and weather 
effects (Turon, 2010). Scientists use space observatories and telescopes to study neighboring planets 
as well as galaxies that are billions of light years away. Some of the major space observatories are listed 


below in Table 3 (Space.com, 2016). 


16 


Table 3 - Major Space-Based Telescopes and Their Wavelength of Operation 


Observatory 


Hubble Scope 
Telescope 


Spitzer Space 
Telescope 


Fermi Gamma 
Ray Space 
Telescope 


Operator 


NASA, Space 
Telescope 
Science 
Institute, ESA 


NASA/JPL/Caltec 
h 


NASA/ US Dept. 
of Energy 


Operation 
al since 


Objective Wavelengt 
hs 


Deep Space | Visible, UV, 
Objects Near- 
Infrared 


Infrared Infrared 
telescope 


Gamma ray 
studies 


Observation 
Instruments 


eNear Infrared 
Camera 

e@ Multi-Object 
Spectrometer, 
eAdvanced 
Camera for 
Surveys, 

eWide Field 
Camera 
eCosmic Origins 
Spectrograph, 
eSpace Telescope 
Imaging 
eSpectrograph 


e Astronomical 
imaging 
e@Photometry 
eSpectroscopy 
eSpectrophotomet 
ry 


eLarge Area 
Telescope (LAT) 
eGamma-ray Burst 
Monitor 
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Table 3, Continued - Major Space-Based Telescopes and Their Wavelength of Operation 


Observatory Operator Operation | Objective 


al since 


NASA/GSFC 2004 Gamma ray 
studies 


INTEGRAL ESA/RKA/NASA _ | 2002 Gamma ray 
studies 


Swift Gamma Ray 
Burst Explorer 


XMM-Newton ESA 1999 X-ray 
Astronomy 
NASA/SAO/CXC_ | 1999 Telescope 
detect X- 


Chandra X-ray 
Observatory 


Wavelengt 
hs 


Gamma 
Ray, X-ray, 
UV, Optical 


Gamma 
ray, X-ray, 
Optical 


X-ray, 
Optical 


Observation 
Instruments 


eBurst Alert 
Telescope 

eX-ray Telescope 
eUltraviolet/Optic 
al Telescope 


eimager on-Board 
the INTEGRAL 
Satellite 
eSpectrometer for 
INTEGRAL 
eAnticoincidence 
Shield 

eJoint European X- 
Ray Monitor 
e@Optical Monitor 
Camera 
e@|NTEGRAL 
Radiation 
Environment 
Monitor 


e@European Photon 
Imaging Camera 
eReflection 
Grating 
Spectrometer 
e@Optical Monitor 


e AXAF CCD 
Imaging 
Spectrometer 
(ACIS) 

e@High Resolution 
Camera (HRC) 
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Table 3, Continued - Major Space-Based Telescopes and Their Wavelength of Operation 


Observatory 


Herschel Space 
Observatory(spac 
e) 


Planck 
Observatory 
(space) 


Objective Wavelengt 
hs 


Far Infrared | Far- 
and Infrared 
Submillimet 

er Telescope 


Low Microwave 
Frequency 

Instrument, 

High 

Frequency 

Instrument 


Observation 
Instruments 


eHeterodyne 
Instrument for the 
Far Infrared 
ePhotodetector 
Array Camera and 
Spectrometer 
eSpectral and 
Photometric 
Imaging Receiver 


e@High Frequency 
Instrument 
eLow Frequency 
Instrument 





Figure 8 - Example of Space Observatory - Hubble Telescope (NASA, 1997) 
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3.1.5. In-Space Experimental Instruments and Rover Data Collection 


The microgravity and radiation environment in space provides a unique setting for conducting research. 
Specifically, it yields important data that can be collected exclusively in this environment. The 
instruments for the biology, biotechnology, Earth and space science, physical science, future 
technologies, and human research experiments carried out onboard the International Space Station are 
key producers of space big data. Some of the instruments used to obtain medical information from 
astronauts include the Advanced Diagnostic Ultrasound in Microgravity (ADUM) instrument, the 
Advanced Resistive Exercise Device (aRED), the Cycle Ergometer with Vibration Isolation System 
(CEVIS), the Combined Operational Load Bearing External Resistive Exercise Treadmill (COLBERT), and 
thermometers (Clement, 2011). 


As of today, the majority of data gathered about humans in space has come from government missions 
to the ISS and other manned spaceflight missions. As the commercial space travel industry develops, 
the need for a comprehensive understanding of how the human body will be affected by travel to 
space has come to the forefront (NMLegis, 2016). As the commercial spaceflight industry begins to 
send more humans into space, associated data to better understand the risks involved in human 
spaceflight will likely continue to grow. It is also not just about risks, but equally about the increased 
usage of sensors to monitor vital signals and overall health of space farers. While the future 
measurements will likely mirror those currently being collected from astronauts, the added liability and 
privacy risk could cause a much more comprehensive and granular collection of traveler data. These 
effects could range from the physical effects of the high g-force on liftoff to the psychological effects of 
leaving Earth (Foster, 2016). 





Figure 9 - Example of Human performance in space - International Space Station (NASA, 2012) 


In addition to data collection from in-space experiments aboard the ISS, spacecraft and rovers like 
ESA’s Rosetta spacecraft and NASA’s Curiosity rover are also continuously collecting data. This data is 
especially important so that space agencies like NASA are able to obtain data for analysis of the 
planetary surface, climate, and signs of life, as well as for processing data for further applications, such 
as human habitability. 
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3.2. Data Downloading 


For SBD, the downloading process involves the transfer of data from where it was generated to the 
potential processors, repositories, and/or directly to end users if no processing is needed first. For 
ground-based data manufacturers, this downloading process is achieved mainly through fiber networks 
(IRA, 2016). Some of the ground manufacturers that don’t have access to terrestrial networks use 
telecommunication satellites to download the data. For space-based manufacturing systems, onboard 
processing and storage capability are constrained by power and space limits (System Identification, 
2003). Data generated in space must be downloaded to the ground for initial or further processing and 
analysis to provide meaningful results. Once data has been downloaded to the ground, the various 
repository and dissemination methods will be employed to store and share it. 


With the latest advances in sensor technology, the real-time data generation on spacecraft has become 
faster and faster. For example, data generated from the Charge-Coupled Device (CCD) on some Earth 
observation satellites can reach up to 1.14 Gbps per CCD chip (Zhang and Shuyan, 2009). The 
downloading capacity, however, is restricted by spectrum, bandwidth, gain, antenna size, and 
transmission power (Tozer, 2016). These are all necessary restrictions and will continue to exist as long 
as there is the need to separate data manufacturing from processing and storage, especially because 
the data generation rate has increased at a faster rate than the processing and downloading rates. To 
bear the sharply increasing data transmission burden, the data manufacturers (sensor technologies) 
must optimize spectrum use, increase antenna size, and develop more efficient power supplies, while 
controlling costs or reducing the data volume needing to be downloaded. 


Generally, there are two ways of downloading SBD. The first way is by using direct space-to-ground 
communication stations. For most Earth observation satellites that adopt inclined or sun-synchronized 
low Earth orbits (LEOs), the orbital period is approximately 90 minutes and the time window to connect 
to a ground station is typically around 10 minutes. For this reason, data must be stored onboard the 
satellite and transmitted to the ground station over the course a period of several hours. For satellites 
in geostationary orbit (GEO), one ground antenna pointing to the specific satellite provides the wider 
line of sight needed to allow continuous data downloading. 


The second way to transmit data to the ground is through space-based data relay. In these systems, the 
onboard data is transmitted to a relay satellite, and then from the relay satellite to the ground. For 
example, all the experimental instruments on the International Space Station (ISS) generate huge 
amounts of data. Several communication upgrades on the ISS are carried out to boost the scientific 
output. The U.S. Tracking and Data Relay Satellite System (TDRSS) in geostationary orbit provides the 
ISS almost continuous real-time communications to the ground (Nguyen, Hadjitheodosiou, and Baras, 
2004). The Ku band download and upload link on the ISS is now 300 Mbps and 25 Mbps respectively 
(Cecil et al, 2014). The U.S., EU, Japan, and China have developed the following data relay satellite 
systems: 


Table 4 - Countries and Their Corresponding Data Relay Systems in Orbit (Yang, H., 2016) 


Data Relay Satellite Systems in 
Orbit 





USA TDRS, SDS (military) 
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Russia Luch 





The latest data relay systems have reached up to an 800 Mbps download link on Ku/Ka bands (Yang, H., 
2016). It is very hard to further increase the data rate by radio technology. Laser communication 
experiments are carried out as the next breakthrough (Yang, H., 2016). Laser communication is ideal for 
space communication links between space vehicles and relay satellites. Laser communication links 
accommodate speeds up to 1 Gbps and transmit with stronger signal because light waveguides are 
packed much better than radio and sound signals. However, one drawback is that the atmosphere 
easily affects laser communication. Physical obstacles like birds, space debris, and tree limbs can 
obstruct the laser’s line of sight. There can also be attenuation of laser signal caused by atmospheric 
particles such as aerosols and other particulate matter (Ricklin et al, 2006). 


Conclusion 


Data manufacturing is an important process representing the foundation of the SBD supply chain that 
enables the subsequent levels. Space-based and ground-based data manufacturers have been 
producing an increasing amount of space data through improved technologies. Nonetheless, there are 
many open challenges remaining and areas for improvement in both data manufacturing and the 
connections between data manufacturers and the rest of the data value chain. 


3.3. Manufacturing Challenges 


Data manufacturers carry the burden of setting the tone for the data to be used later in the value 
chain, but also reconciling that with requirements from the entity controlling the manufacturing 
mission itself. In addition to the specific task of generating data, the manufacturers must also help 
provide the link between the data collection and those users and processors that will use it. These 
include clear insufficiencies in current technologies as well as ancillary legal and business concerns. 


3.3.1. Data Downloading and Transmission 


After initial review, it appears that one of the most significant challenges that manufacturers face 
regarding space data is related to the downloading process limitations. The bandwidths for space- 
based telecommunication systems are constrained, along with a limited time window (typically 10 
minutes, as described above) for downloading data from a satellite to a ground station antenna. 


Ground-based sensors such as telescopes and radars don’t usually face the same data downloading 
challenge since they can use fiber optic cables to transfer the data from one point to another. Any 
improvement that leads to an increase in the downloading limit is incredibly valuable to engineers of 
data manufacturers who are always striving to improve their capacity. In the end, it seems that the 
amount of data generated and found is not actually limited by mankind’s ability to use data. Instead, 
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manufacturers (and this extends to the entire SBD lifecycle) are limited by their ability to download 
and, further on, disseminate the data collected. 


The distant space environment produces unique constraints for the transfer of data, while ground 
transfer from one place to another does not bring too many issues (Hart, 1997). Still, we should take 
into consideration that to finally bring the data from one ground facility to another, several solutions 
exist. Some may use space-based solutions, too, from one spacecraft to another, using GEO satellites 
for instance, such as the European Data Relay System (EDRS) (EDRS, 2016). Any time the transfer has to 
use open field instead of multiple fiber optic cables, there will be downloading constraints. Spacecraft 
manufacturing companies are well aware of the problem and understand that it is directly linked to the 
amount of data that can be generated usefully. If unsolved, the downloading rate problem logically 
restrains big data from its natural growth; consequently, research and any application development will 
be restricted, too. That is the reason why spacecraft designers and telecommunication engineers have 
to take downloading issues into account. 


Data manufacturers face challenges that may not be new. In 1995, the U.S. government had already 
identified those challenges (Office of Technology Assessment, 1995). But due to the data growth, when 
it comes to SBD, the identified challenges have to be addressed in a much more efficient and cost- 
effective way. The Hyperspectral Infrared Imager (HyspIRI mission) (Hartzell, et al., 2009) is a good 
example of the need for creating and using a disruptive technology. 


Insufficient data rate, time constraints, multiple access points, and onboard data storage are all issues 
related to data downloading that need to be addressed by the data manufacturers’ engineers. We 
decided to consider only those issues related to spacecraft which are the most challenging in terms of 
engineering development. 


3.3.2. Insufficient Data Rate 


The data rate from space to ground (or the opposite way) is limited by its channel capacity. As a result, 
the data download rate is relatively slow compared to the overall data generated onboard. Further 
improvement seems to be difficult. The Shannon-Hartley theorem explains why there is a maximum 
downloading rate that can be used to transfer actual information over a communication channel 
(Mathuranathan, 2008). The link relies on limited specified bandwidth available. According to this 
theory, the channel capacity depends on the available spectrum and the power of the transmitters. It 
is also constrained by the signal attenuation due to the long range, and some link disruptions related to 
the environment from which the signal is transmitted. From another point of view, knowing that the 
power of the transmitters is strictly constrained by the power budget of the solar panels, some 
additional limitations can result from the design of the spacecraft (Office of Technology Assessment, 
1995). To organize and share the limited resource, the available bandwidth is defined and legally 
assigned by the International Telecommunication Union (ITU) (ITU, 2013). 


3.3.3. Time Constraints 


The limited amount of data that can be transmitted to the ground every day also depends upon the 
number of short existing opportunities to download it to the ground stations. LEO spacecraft are 
usually only continuously visible for approximately ten minutes per ground station (Pearson et al, 
2016). Space data is therefore stored on board between each downloading window, and has to be 
transferred as soon as possible since the storage capacity is limited and a missed opportunity to 
download may interfere with future collections of data. A balance then has to be found for every 
satellite design between the capacity to produce and to store data and the capacity to actually 
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download it to the repository. This limit is even more constrained by the fact that every message has to 
have beginning and ending sections dedicated to the verification of the data. Consequently, high 
performance and reliable compression algorithms are required to download as much data as possible 
during every flyover of a ground station. This is related to the veracity of the data. 


Time delay also degrades the value of certain types of space data. Ideally, all data produced would be 
transmitted to the ground as soon as possible, which is again problematic with LEO satellites. Apart 
from a few military satellites, real time continuous downlinks are not usually a requirement for 
satellites. If data is downloaded in real time, the various stakeholders can use it in an optimal fashion. 
For example, time is crucial for Earth weather forecasting, since weather conditions are often rapidly 
changing. In that case, as for many others, time delay can render the information outdated, invalid, and 
practically useless. For crisis management, timeliness is also critical. The ground situation may already 
be completely different if the data is downloaded after a significant time delay. 


3.3.4. Format and Standards 


Manufacturers do not have the engineering challenge regarding data format and standardization. They 
are only complying with the requirements of their customers. The customer must specify what it 
expects from the manufacturer at the earliest stage of the mission design, especially regarding the 
payload. If they are not asked to comply with a specific format, data manufacturers may end up 
providing very different types of data that may not be compatible with other data available. Such data 
may then be useless for customers who plan for large-scale merged analysis. Thus, if the data owner 
expects to sell its product to a wide range of customers, it should focus on standardization of its data 
requirements across the value chain, which will impact the manufacturing standards; however, it is 
imperative that the entire chain adapt simultaneously. 


3.4. Conclusion 


Raw data is manufactured from a variety of different sources, often each with its own unique format. 
Manufacturers must be able to provide customers with valuable data to provide valuable products. 
With regard to other constraints, manufacturers face a growing need to improve the data downlink 
from space. More specifically, if SBD is to be useful, it needs to be almost accessible in real time. There 
are many steps to take to resolve these issues, which will be discussed in later chapters. However, the 
first step is to understand both the nuances of data manufacturing and the origin of space data as well 
as how it connects with the rest of the value chain. 
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4. Processors 


Once data has been manufactured and stored, it can be processed. The generated data creates a 
framework for what humans can learn about the world, processing the sheer volume of data generated 
presents a challenge. With the proliferation of big data, such limitations exacerbate the situation. 
Before we can use it, the data must be run through software with algorithms to distill relevant 
information from the bulk dataset. The processed data can then be used in decision-making as well as 
domain-specific usage. 


The first step to extracting value is to process the data; SBD is not about the data, but the meaningful 
information that can be extracted from it. Carl French defines data processing as “the collection and 
manipulation of items of data to produce meaningful information” (French, 1996). French’s definition 
asserts the importance of finding meaning in the abundance of data, and the vast number of 
applications necessitates analyzing the same data in different ways, using different methods. 
Processing data involves “exploring, transforming, and modelling data with the goal of highlighting 
relevant data, synthesizing and extracting useful hidden information with high potential from a 
business point of view. Related areas include data mining, business intelligence, and machine learning” 
(Curry, 2016). Because of the high demand for data processing of SBD, government agencies, satellite 
companies, data manufacturers, private data analytics companies, and universities provide data 
processing services. Each of these different processing sources has different priorities, and as a result, 
each has different methods for processing data and distributing findings. 


4.1. Mapping the State of Key Processors 


In the following sections, the key data processors are identified, analyzed, and mapped in their current, 
as well as near future state. In order of appearance, these processors are: Agencies, commercial 
institutions, academia, and crowdsourcing groups. Following on from these, three case studies are 
analyzed. The second part of the chapter analyzes the challenges faced by the data processors. 


4.1.1. Space Agencies as Processors 


Space agencies are manufacturers only in so far as they or their equipment generate SBD, but they are 
processors in so far as they process the generated data. One of the primary SBD processing foci for 
government agencies is mission data, because a mission may collect a variety of voluminous amounts 
of data at a high velocity, relating to both real-time mission analytics and gathered data meant for 
future use. An example of such a mission is NASA’s Mars Rover, Curiosity, which is a semi-autonomous 
robot that gathers generated data (raw data) and sends the data back to Earth (Taylor, 2012). On the 
other hand, the same data must be made available in real time for flight operations to work efficiently. 
This approach is demonstrated by NASA’s Mission Data Processing and Control System (MPCS). 


MPCS works by interfacing with NASA’s network and the Mars Reconnaissance Orbiter (MRO) in orbit 
around Mars. The MRO then relays data to and from the Curiosity rover and processes the raw data in 
real time to generate results usable by both Curiosity and flight operations. Curiosity rover needs to 
accomplish the drilling and mining tasks for the ground base, which, in return, sends commands to 
operate the next act. The transmission rate between Earth and Curiosity is 8Kbit/s, and 2Mbit/s of that 
is with MRO. The rate between the MRO and Earth is 256Kbit/s. By setting up a system this way, it can 
largely promote the velocity when transmitting data, which could guarantee efficiency. In addition, the 
data is configured into custom data visualizations for the flight operations team. In short, MPCS can 
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take data points related to weather condition, rover position, and external forces, and make sense of 
them in real time. Prior to MPCS, processing this data could take days to accomplish, so the enhanced 
processing ability has greatly improved mission efficiency and general capabilities. AMMOS (Advanced 
Multi-Mission Operations System) is an example of MPCS (AMMOS, 2016). AMMOS “provides most of 
the ground data system functions needed to design, implement, and operate a Mission Operations 
System (MOS)”. AMPCS deals with Telemetry Input Handling, Alarm Processing, Session Handling, 
Reporting, and Automation Support. 


Mission data may also come from human space missions, such as the near continuous monitoring of 
the 55 astronauts while they were aboard the International Space Station, as stated in section 3.1.5. 
Mission data needs to be processed so that value can be extracted from the SBD bulk dataset. The 
value hidden within can then be collated in a meaningful form (human readable). Human readable data 
allows researchers to understand the raw data as well as permitting data to be interfaced and read by 
other systems and languages. 


4.1.2. Commercial Processors 


Despite the efforts of agencies, the versatility of SBD is such that there is potentially a high demand for 
customized data analysis that might not be regarded as necessary or worthwhile by or for an agency 
because of budgetary-, resource-, and/or political constraints The main driver of a private company’s 
actions is profitability, and these companies either process their own data that comes in for sale or take 
requests for new data or analytics. DigitalGlobe Inc. (2016a) is a company that both provides and 
analyzes raw data for specific purposes and for general provision. If a company or organization is 
interested in remote sensing data, then DigitalGlobe can generate the associated data as well as 
process it so that it is presented in a way that is useful to the customer. In 2015, Amnesty International 
used data gathered by DigitalGlobe (2016b) and processed it to understand the impact that Boko 
Haram had on population movement and on cities in Nigeria. Amnesty International is hoping to use 
this data to track Boko Haram’s movements to anticipate where more damage will be done in the 
future. 


In addition to companies providing analytical services, it is not uncommon for companies to provide 
software for groups to analyze their own data. Harris GeoSpatial Solutions (Harris, 2015) is an example 
of a commercial processor that offers a wide range of software, through Application Programming 
Interfaces (APIs), which may be used depending on what type of data will be analyzed. Harris’ data 
processing framework has been used by the defense-, environmental-, oil- and gas industry, and 
academia. That said, the wide-ranging impact of Harris’ software is that researchers and customers are 
able to process data found from other sources on their own. There is also a concern that users are 
processing data for their own use, but not making it available to others. Another commercial example is 
Terra Bella, a startup acquired by Google (2016). Terra Bella is involved in different parts of the data 
value chain: from manufacturing (producing its own small satellites), through processing and 
applications (the company’s product is analytics on top of satellite produced data and application of 
concrete issues in the world). 


4.1.3. Academia 


While data processing is largely driven by public interest and profit in academia, the purpose of 
research and learning relates to the global well-being and quality of life on Earth. With regard to 
feasibility of life on a planet being a key factor in developing data processing technology, students and 
professors have the advantage of being free from the need for profit and from much of the political 
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turmoil involved in agencies’ data processing, so they often use this opportunity to conduct higher risk 
or lower-profitability studies. 


Many of the risk studies are geared towards improving existing technology, and lead to the 
advancement of data processing. One of the advantages is that academia often has access to data and 
resources where a private company might not. To illustrate, the Massachusetts Institute of Technology 
(MIT) developed the JULIA programming language (Bezanson, 2016) to streamline programming 
languages. JULIA was meant to be easier to use to analyze data, but it is also suitable for extremely 
complex datasets. It has been used in areas ranging from spaceflight mapping to bioinformatics, to 
environmental initiatives. 


4.1.4. Crowd Sourced Processing 


Crowdsourcing is the practice that obtains services through contributions from a large group of people 
rather than from traditional employees or suppliers. The interest for SBD is that crowd sourced 
processing can bridge the gap between human and machine computations (Anhai, Raghu, and Alon, 
2011). The human intellect has absolute advantages in understanding images and signals, which are the 
main form of SBD, with human intrinsic values. In this way, the crowd is known as the Human 
Processing Unit (HPU) to solve high level challenges together. For example, Zooniverse is a platform for 
crowdsourcing research. The volunteers from all around the world can help to analyze various pieces of 
information more quickly and accurately than computers to accelerate the research procedure. 
Crowdsourcing has been introduced into database, data mining, social media research, and applications 
(Lei, Dongwon, and Tova, 2015). If the distribution and policy issues are solved, the HPUs will play an 
important role in the extraction of high-level information from SBD. 


4.2. Challenges for Processors 


The wide range of data content has created an environment with the potential to affect everyone. 
Navigating the political arena to define what is and what is not legal and accepted practice has been a 
challenge (NASA Science, 2016b). Even with adequate data availability, access, and legal viability, there 
are many cases where processing technology is unable to effectively process data in a meaningful way. 
The overarching challenge of big data processing stems from its definition; the high volume, velocity, 
variety, and veracity of the data requires processing to achieve more value from the data. In their 
current state, data processors face a number of key technical challenges. The most significant 
challenges facing data manufacturers are identified as standardization, accessibility, interfaces, and 
data processing methodology. To generate even more value that better reflects the entire scope of the 
data, better technological tools for processing are needed. This subsection analyzes these key 
challenges. 


4.2.1. Lack of Standardization 


The term “data standardization” refers to methods that create uniform data sets, based on common 
factors of definition, format, representation, and structure (IBM Knowledge Center, 2011). Just as with 
data manufacturers, there is a standardization problem for data processors. Adopting common 
approaches to data standardization increases the data consistency and credibility. Data users should be 
able to clearly identify what type and quality of raw data is available to them that will allow them to 
make effective use and to extract higher-level information. In general, the new world of big data does 
not follow the same principles of standardization as traditional research (Sarma, and Ato 2014). Having 
a standardized approach to processing SBD implies a full understanding of the resources and data sets 
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available. The “unstructured nature” (Microsoft, 2016) of SBD has made standardization a challenging 
issue to resolve. 


Beyond the technical challenges preventing a true standard for data, ensuring they are ready to be 
processed, there also exists a major legal and political barrier to development, which is exacerbated by 
the fact that multiple countries and agencies are gathering overlapping data in different formats and 
using different identifiers for similar information. For example, the generation of observational data 
must conform to any legal and political agreements that exist between different countries. 
Additionally, certain frameworks that encourage sharing of data across borders, such as with the 
sharing of weather data, do not apply to imaging data (Copernicus, 2016d). 


Currently, data processors do not have the ability to effectively combine different forms of data - even 
if they are similar - because there is a severe lack of standards in terms of formatting and labeling. Lack 
of coordination, financial viability, and legal and political factors, prevent interrelated big data from 
being stored in one location. This leads to a combination of repetitive and disorganized data that is 
difficult to process. Even if there is a way to compile and reference information that is or is not related 
to each other, we are still facing a challenge of lack of accessibility to do it. It is difficult to reference 
different things together as there is no existing standardized platform for using that data. 


The barriers to processing SBD are all interrelated and present unique challenges. When they are able 
to process data the risk of miss-sharing data or breaking laws and infringing upon established policies 
creates an unprecedented challenge for processors trying to make sense of raw data. This is because 
SBD is generated at an international level, in space, where much of the data is relevant for the entire 
world. Until there is better coordination across the entire sector in terms of both regulations and 
storage standards, processors will continue to be unable to maximize their efficiency and usage of SBD 
(IBM, 2016). 


4.2.2. Accessibility 


Accessibility can be divided into two categories, Open Access and Restricted Access, depending also on 
the processing level. Open Access means that data can be accessed by the general public, while 
Restricted Access is defined as a limitation of access on data to a certain group of people such as 
scientists, governments and/or agencies. 


Data from data processors, in case of Earth Observation (EO), is available at various processing levels. 
NASA defined a processing level standard (NASA science, 2016c) for their Earth Observing System 
(EOS), which has been adopted by other players in the field. This can be found in Table 1 within the 
introduction to chapter 2. The challenge is that not all levels of processed data are available to 
everyone and not everyone can request high resolution images of an area of interest. For instance, 
ESA’s EO program Copernicus provides Level 0 - Level 3 products free of charge through a website that 
requires a simple registration. In addition, it is possible to request cloud computing time to perform 
processing on the raw data. For high resolution observation of specific areas or for a large amount of 
computing power, ESA requests a proposal before granting access. The end-user agreements for using 
freely available EO data are defined by the data manufacturers. Rescheduling of EO mission plans are 
allowed on the basis of multilateral agreements that give priority to certain regions for monitoring and 
acquiring EO data, which also can apply in exceptional cases such as disasters. 


Commercial EO satellites use a subscription model to which a customer can subscribe to access 
archived EO data. The pricing is based on the size of the observed area and the number of bands 
requested. Examples are Worldview 1-3, Quickbird, Geoeye, and Ikonos. EOS data from military 
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satellites is generally restricted and not accessible to the public. However, a number of dual-use 
satellites are in orbit, and share the same instrument for civil and defense purposes. In such a situation, 
priority is usually given to defense operation over scientific or commercial use. Not all data is available 
for an unlimited amount of time, as the storage capabilities are limited. In the example of the EO data 
from the EU’s Copernicus program, data is stored for one month in a rolling archive. This creates two 
dilemmas: first is that scientists are unable to use future processing techniques on older data, which 
has in the past proven to be successful and to yield new discoveries. The second is that since data is 
only stored for a period, we are unable to use it to generate reliable trends. This can create a problem 
if a high data output space telescope is operational and the amount of data produced cannot be stored 
long enough for second order scientists. In the case of the Gaia spacecraft - an ESA mission to observe 
and map faint suns and the galaxy at large - the scientific data is not being made readily available. Asa 
result, the analysis is exclusively done by the scientists contributing to the mission rather than being 
used by the masses. 


4.2.3. Data Processing Method 


Data processing has two major problems; data mining and multi-source data fusion. Data mining 
algorithms limit computer-based processing (Microsoft, 2016). Data mining algorithms determine 
which data is referenced, how it is applied, and the models that result from it (Visa, 2013). The use of 
data mining algorithms has resulted in significantly faster processing speeds when compared to human 
processing or non-algorithm based computer processing. Currently, the technology that enables 
computer-based processing cannot effectively process and understand images and signals and how 
they relate to each other, which represents much of SBD. The lack of flexible data mining algorithms 
that can effectively use big data has proved a difficult hurdle to overcome (Microsoft, 2016). This 
represents a “Semantic gap” between what computers can process and what human experts can 
extrapolate from the same data (Gancarski, 2014a). Other forms of big data can be processed and 
referenced through other methods that do not apply to Space Big Data. For example, internet big data 
is processed and understood by Google, using available technology, because internet big data is largely 
text-based. Even images searched through Google are referenced using text-based tags associated with 
the image. The same framework does not exist in SBD, creating a dilemma of still-undeveloped 
technology. 


Using predictive analysis to understand trends with processed data has been a challenge because of 
further issues relating to storage. Predictive analytics often relies on trending data over time and 
combining the results with ongoing data to better understand future results. With the data storage 
duration challenge, predictive analytics capabilities are negatively impacted by the current state of big 
data processing. 


Another challenge pertaining to processing is the limited amount of influence on the onboard 
processing. To optimize data usage downstream, satellites use a set of algorithms that analyze the data 
before transmitting a higher level of the processed data to Earth. For example, clouds automatically 
mask land areas for Earth observation satellites that operate in the visible spectrum. Better imaging 
results for the scientists and users processing them are obtained after almost all the clouds have been 
removed with the help of the set of algorithms onboard the satellites. On the other hand, while it may 
not be the primary goal of the mission and the satellite, big data revolves around the core principle of 
using data for different purposes. As a result of the onboard processing tailored to the needs of one 
primary data user, the raw data that could be useful to other users is not readily available. 
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Whether they are experts or amateurs, when humans are processing data themselves or are translating 
processed results, data must appear in a format they can access. For that, there needs to be a 
visualization analysis, or the ability to visually represent the data, if the same or additional processing 
must be done by humans, since data processing is done by computers rather than humans. Since not 
all data will be viewed by humans, there also exists the challenge of determining which data needs to 
be visualized at all. The compounding barriers to smooth human and computer mixed processing 
schemes prevent Space Big Data’s potential for easier crowdsourcing or mass-dissemination of data for 
processing. 


4.2.4. Interfaces 


Data interfacing is the process by which data from different sources or formats are combined or 
compared effectively. Current interface systems are not set up to effectively limit the number of 
standard formats and sharing data is often a difficult task. At the moment there is no universal 
application program interface (API), which could be a great solution for this issue. An API is a set of 
routines that allows users to combine software packages. Data interfacing not only includes data 
integration (the import of data to the respective system or platform) but also the ways in which these 
systems interact among each other. 


In some big data-related simulation systems, different types of data represent different types of 
information, which means that they are generated at different times by different sensors, satellites, 
and missions. The result of lacking a universal or even a relatively flexible interface is that there is no 
easy way to generate the resulting processed data from different sources or types of data. Coordinating 
such a large variation in the data has proven to be one of the largest challenges in big data. In 
addition, sometimes there is a need to incorporate historical data that has been archived on non- 
computer media. The results, if achieved, are not done in real time as a universal API would allow, 
leading to a degradation in the quality of the application of such data. Unfortunately, for a universal 
API to be successful in the current data environment, it would need not only to be a confluence of 
every existing standard, but also be adaptable to any future changes. 


4.3. Concluding Comments 


Different groups, companies, and institutions all serve as data processors in different capacities, 
representing the conflicting and overlapping interests of different bodies in this segment of the supply 
chain. However, each type of processor is part of the larger picture of how the industry can draw 
meaningful results from raw, or even previously processed, data. Data processing capabilities and 
variety lead to possible downstream applications, and dictate, to a certain extent, the scope of the 
Space Big Data industry. 


The challenges identified relate to data processing methodology, interfacing, and accessibility. 
Accessibility is important: without access, the purpose for which the data is collected is irrelevant. In 
addition, limiting access inhibits research being conducted on the data, which could form the basis for 
new technological developments. Linked to that is the need for data to be capable of being processed. 
Regarding interfaces, the lack of a universal API was identified as both a challenge and a utopian 
solution, showing how challenging the issues are, as the challenges may not be black and white. 
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5. Applications 


5.1. What are SBD Applications? 


Space data has the power to revolutionize whole industries by providing accurate, reliable, and 
consistent information to drive decision-making. Diverse fields such as agriculture, transportation, 
fishing, and retail all stand to benefit from big data. Applications of SBD inform government decision- 
making and improve commercial use of data. Space applications generate added value from space big 
data (ESRI, 2016; Buczkowski, 2016) for public and private sectors. Applications that harness space data 
generally fall into one of the following categories (Tan, 2016): 


@ remote sensing and GIS 

e satellite navigation 

e satellite telecommunications 

@ astronomical observation and space microgravity science 


The applications section focuses on remote sensing applications, since the benefits of this particular 
field are easily described and demonstrated. The data applications stakeholders and market trends are 
described in the following sections. 


5.1.1. Stakeholders 


The key stakeholders in SBD applications are private and public, with the market including 
governments, private companies, non-profit organizations, and individual end users. Each stakeholder 
interacts with the other entities based on its background, goals, and resources. 


Governments can use SBD to drive decision-making particularly in forestry and land use sectors. 
Applications developed for weather forecasting are among the most visible examples of SBD use (Sala, 
2016). The example of weather satellite to forecast the path of Hurricane Sandy described in Section 
2.1 allowed the U.S. government to accurately plan its natural disaster response, protecting millions of 
lives and saving billions of dollars (AIA, 2012). 


Governments also share SBD to encourage economic growth. A common trend is for governments to 
build open access platforms that present space data to the general public. This provides opportunity for 
commercial entities to extract value from SBD. Sample initiatives by governments and related 
organizations to provide public access to SBD are shown in Table 5 below. 


Table 5 - Initiatives to Provide Open Access to SBD (Khoso, 2016; CHEOS, 2016; Orbital Insight, 2016; 
Keating, 2016) 


Governments and Description 


Organizations 
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Offers the public access to a catalog of NASA 
datasets and tools. 


International A two day hack-a-thon where international 
collaboration originally |innovators solve challenges using data sets from 
established by NASA NASA and global partners. 


Copernicus (2016b) |European Provides free access to data from Sentinel satellites 
Commission/ESA in the Copernicus constellation. 


CHEOS (2016) A national high resolution earth observatory data 
centers distributing around China to promote 
application of remote sensing data. 





Private companies and non-profit organizations are using space data applications to address social, 
economic, and humanitarian needs. Growth of these companies stems from a number of recent trends, 
including increases in computing speeds, advances in distributed computing, decreases in cost of access 
to space, and miniaturization of electronics. Over recent years, hundreds of millions of U.S. dollars from 
private sources have been invested in private, for-profit organizations to create new markets and 
disrupting existing ones. The majority of the current organizations allow paying customers secure 
access to proprietary data, a set-up termed walled gardens. Table 6 describes examples of companies 
using space data and Table 7 provides examples of non-profit applications that crowdsource the 
analysis of SBD. Figure 10 shows examples of SBD applications developed for public and private 
purposes. 


Table 6 - Private Companies that Generate Value Through the Use of SBD 


Private Country | Description 

Company 

Gyana (2016) Uses a combination of space and social media data to capture changes 
in population mood at different times and places. 


Spire (2016) Assists search and rescue operations, maritime domain awareness, 


insurance, and trade monitoring. Tracks illegal fishing and piracy. 


ESRI (2016a) U Provides geographic insights with ArcGIS® software for business 
location planning, asset management, urban planning, land use 
monitoring, crime modelling, and predictive modeling. 


UK 
USA 
SA 
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SpaceKnow Tracks economic trends such as trading and manufacturing indexes. 
(2016) 


Orbital Identifies trends from earth observation data using artificial 
Insight(2016) intelligence software to provide insight for industries including retail, 
oil storage, global water reserves, and agriculture. 


Beijing 21 STC i Operates Beijing-1 and Beijing-2 Earth observatory satellites and 
(2016) provided data processing, applications, and life cycle services. 


Beijing Geoway i Provides integrated multi-source remote sensing image processing 
(2016) platform, GEOWAY Storm, and application integrated service. 





ArcGIS - Manhattan Coffee 








Figure 10 - Public and Private Applications of SBD (Commercial and Police Intelligence Mapping). Left 
figure shows locations of coffee chain Starbucks in New York City using on heat map imagery (Kerski, 
2015). Right figure shows a screenshot of ArcGIS® and ModelBuilder™ used by Lincoln Police 
Department for intelligence-led policing (Dhami, 2011) 


Table 7 - Non-Profit Organizations that Generate Value Through the Use of SBD 


eee eee —“‘i‘sCSYz 


Zooniverse UK Web-based platform for professional researchers to utilize hundreds of 
thousands of volunteers to conduct citizen science 
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Standby Task A global online network for citizen volunteers to analyze remote 
Force sensing data after natural disasters and conduct election monitoring 


SETI at Home A scientific experiment in which citizens run a free program that 
downloads and analyzes radio telescope data Internet-connected 
computers in the Search for Extraterrestrial Intelligence (SETI) 





5.1.2. Market trends 


The growth of launch vehicle, satellite manufacturer, and spacecraft operator markets has led to an 
explosion of downstream activities. The space downstream market refers to the space applications and 
products provided to the end-users and form an integral part of space economy. These include direct- 
to-home satellite television services, satellite navigation consumer equipment, and value-added 
services, as well as small terminal providers for data handling and banking. The UK downstream market 
is expected to grow to GBP4O billion by 2030 (Technology Strategy Board, 2014). 


5.2. Challenges for Applications 


As more public and private funds flow into commercial applications, the downstream boom in 
application markets is expected to quadruple in size over the next three decades (Technology Strategy 
Board, 2014) with new entrants to the market, referred to as NewSpace, further increasing possible 
areas of growth. This makes it even more important to identify the challenges facing SBD applications. 
The next sections address financial, market identification, transparency, legal, and policy 
considerations. 


5.2.1. Financial Challenges 


One important aspect of business viability is the ability to generate cash flow and maintain financial 
stability. External factors such as attracting investment, calculating costs, and addressing unstable 
revenue streams challenge for-profit applications. These risks are most apparent at the start-up phase. 
Challenges that occur throughout the lifecycle of a company include risk of currency devaluation, debt 
service, asset maintenance, and corporate policy changes. It is important to note that these challenges 
are not unique to SBD, and are similarly present in every emerging high-tech sector. The impact and 
scale of each of these challenges depend on the company’s business model. There are three main 
models for space application businesses. 


e Gathering space data from publicly available resources, such as space agencies or 
universities, and using it to develop solutions for end users based on market analysis. This 
model requires the lowest level of investment and has the shortest development period. The 
project can attract investment capital more easily and return the investment faster, since there 
are few if any barriers to the data at the core of the application. The majority of costs besides 
the initial application development lie in operation, with routine maintenance and updates 
representing the ongoing variable costs of the product. Revenue from these projects could be 
less stable initially due to the number of competitors and fast pace of market changes. 


e Securing space data from proprietary owners and creating an application that uses this data 
to attract customers. This model requires a longer research and development period and 
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larger investment, which is used to purchase data. Revenue depends on user demand for the 
proposed product. Considering challenges presented in the previous model will help define a 
strategic and tactical approach. 


e Generating space data using the company’s own infrastructure, either by launching a satellite 
or building a ground station, and then developing an application based on collected data. 
This model generally requires the largest investment among the three different business 
models due to the high up-front, and the long term costs of manufacturing and operating a 
satellite. Longer development periods may affect the financial health of the project and result 
in lost business opportunities in a competitive market, but can offer large profits if successful 
due to the high demand and flexibility of satellite operation. 


5.2.2. Market 


While the market for SBD is growing, a large proportion of this development is a result of space 
applications. The key market challenges for SBD application providers are: 


e accurately aligning available data with user needs and forecasting which data will be relevant 
for future consumers based on current trends; 


@ understanding that selected markets may not be large enough to support sustainable 
businesses from a revenue standpoint; and 


@ accepting that users and markets may not immediately understand the benefits of SBD 
applications due to the nebulous nature of big data and the novelty of its use. 


There are difficulties defining and understanding specific space data application markets. The lack of 
understanding by those looking develop and market an application can be frustrating to venture 
capitalists looking to invest (Werner, 2015). This reflects the gap that exists between experts in the SBD 
field, public opinion, and the market. It is very important that the applications are easily accessible by 
the end user market so they become commercially attractive. 


5.2.3. Transparency 


Although SBD applications can offer benefit around the world, there is a lack of transparency about 
types of existing applications and ways they could be used. There is concern that big data applications 
may obscure the decision-making process (White House, 2014), and that individuals may lose control 
over how decisions are made (European Data Protection Supervisor, 2015). Currently, the methodology 
is shrouded in secrecy and trade protection. This can lead to adverse consequences to both public and 
private sectors. It is therefore important to identify how results are obtained and how data applications 
function. 


5.2.4. Law and Policy 


Traditional legal issues relating to data include liability, data transfer, security, and insurance, whereas 
transparency and privacy challenges are more specific to SBD applications. There are considerable 
policy questions on balancing access to information with privacy concerns. This process encompasses 
data quality, data uses, data ownership, as well as the data lifecycle. Data accessibility is important 
because space applications generate new business and provide useful applications for daily life. Privacy 
is also important to protect personal data from being exploited or commercialized. One argument is 
that if such information were open, it would not be able to generate much revenue. 
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There is a unique policy challenge relating to the balance between data openness and data privacy. 
There are considerations about open access data and ownership, as well as differences between 
standardized data and contextual nuances of individual data sets. To highlight the issues in one specific 
field, international development, incentives could be beneficial to the international community for 
sharing data by default, improving quality control in the collection process, increasing interoperability, 
identifying emerging security needs, and utilizing existing data for decision making (Raftree, 2016). 


Transparency in terms of open data has been recognized as important for both the public and private 
sector. The Sunlight Foundation, an organization established to promote accountable and transparent 
governments, maintains that “setting the default to open is about living up to the potential of our 
information, about looking at comprehensive information management and making determinations 
that fall in the public interest.” (Sunlight Foundation, 2016) The challenge here is again privacy. 


Another example of data openness and accessibility challenges is Google Earth, and its breakthrough in 
the field of space big data (Fenton, 2011). Google Earth, together with the related services Google 
Maps and Google Street View, makes the world transparent to every single person, a feat previously 
unimaginable. Increasing privacy invasion concerns and laws, however, have limited Google Earth 
services. Even though Google Earth satellite services are subjected to fewer privacy lawsuits than 
Google street view, countries are still very concerned about satellite data that Google Earth collects. 
For instance, it is illegal for US remote sensing operators to sell high resolution remote sensing images 
of Israel (Geens, 2007). 


Application developers need to consider legal permissions surrounding the use of data. Individual 
privacy rights can be a challenge depending on the geographic location. For example, Orbital Insight is a 
company that is using Earth observation data to derive consumer trends from cars parked in most 
major retailers (Babenko, 2016). Using 50 cm satellite imagery resolution, Orbital Insight acquires 
satellite data and enables companies to extract important information about their customers to target 
advertising. A Pew Research Study revealed “there [are] a variety of circumstances under which many 
Americans would share personal information or permit surveillance in return for getting something of 
perceived value.” (Rainie and Duggan, 2016) However, this willingness is shaped by conditions of the 
data collection and sharing, particularly the length of data storage and possibility of data access by 
third parties (Rainie and Duggan, 2016). 


5.3. Conclusion 


It is difficult to approach the issue of SBD from a holistic viewpoint as there are many different interests 
related to data applications including scientific, legal, human performance in space, and engineering. 
SBD applications have the potential to create new scientific and business opportunities. It is an exciting 
time for space industry applications and the future has great potential for an industrial boom, but there 
are barriers and challenges to this expansion. Business climate thrives on efficient and effective policy 
framework and knowledge of market trends, but investors in space applications are most often 
constrained by data transfer restrictions and prohibitions from one country to the other. Developers 
and manufacturers of space applications face scrutiny for producing high quality technologies that 
could optimize space data use. This poses a threat to space application technology innovation and big 
space data use. 
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6. Data Repositories 


6.1. What are data repositories? 


A data repository is a central location in which data is stored and managed and are essential to the 
lifecycle of SBD. Manufacturers, processors, and applications all need data repositories because they 
enable high performance, easy-to-use, manageable SBD to be accessed and used. We need to 
understand how data repositories function, who owns them, where they are located, how are they 
accessed, what they provide, and how they are funded. 


6.1.1. Ownership and Operation 


Governments, organizations, and commercial entities that develop space missions and activities are 
usually the data owners of the resulting SBD from these activities. Stakeholders who are data owners 
usually build their own repositories due to concerns about security, data protection, and competitive 
advantage. In recent years, space data has trended toward an open model. ESA announced that it may 
diverge from the traditional closed model and share common space data with the public through a 
private cloud (Red hat, 2016). In China, observing data belongs to the Chinese Academy of Sciences 
(CAS), and some of the data could be shared using the National Earth System Science Data Sharing 
Infrastructure (GeoData, 2016). We expect more shift from private repositories to more liberal data 
access in the future. 


6.1.2. Location 


The data repositories of national agencies tend to be physically located within the country’s borders, 
while economic and political factors influence the location of private data archives or archives run by 
cooperative efforts. For instance, several data centers of China’s high resolution Earth observation data 
(CHEOS) were built in provinces of China, with each center taking responsibility for storing EO data 
(CHEOS, 2016). The NASA space science data coordinated archive (NASA, 2016) is located in the U.S., 
but the cooperative effort SIMBAD astronomical database is located in Strasbourg, France (SIMBAD, 
2016). Figure 11 describes the data centers covering most earth observation disciplines, such as 
atmosphere, cryosphere, land and ocean, etc. These data centers make up a distributed system located 
throughout the United States. 
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National Aeronautics and Space Administration 
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and distribute data to users. 


Figure 11 - NASA’s Discipline-Oriented Data Centers (NASA EOSDIS, 2016) 





6.1.3 Accessibility 


Open data is the concept of free access to and use of data. It is similar to open access, but the latter 
refers to the free use of scientific results, such as papers containing scientific knowledge, rather than 
data specifically. 


Each repository has its own policy related to data accessibility. For instance, NASA must keep all its data 
open, regardless of whether data repositories want their data to be shared. Some data repositories 
offer paid access, while others provide free access to the data. Owners may choose to keep their data 
proprietary, due to national security concerns, or for commercial advantage. However, the accessibility 
of data repositories within the space industry is unique due to the high volume of open data within the 
industry. 


Most data repositories rely on web-based services to provide access. Government funded entities often 
require that data be freely distributed. National space agencies dominate the publicly available 
archives. In the case of NASA, the free distribution requirement is dictated by the U.S. Freedom of 
Information Act (NASA, 2015). The U.S. has the largest space budget in the world (OECD, 2011), and is 
able to provide the bulk of easily accessible data in online archives, mainly through agencies such as the 
NASA, the National Oceanic and Atmospheric Administration (NOAA), and the Geological Survey 
(USGS). These agencies produce a wide variety of datasets such as astronomical data and climate data. 
On the European side, European Southern Observatory (ESO) archives astronomical data while 
EUMETSAT provides climate data similar to that at NOAA (Eumetsat, 2016). 
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6.1.4. Data types 


SBD can be classified into three types: structured, semi-structured, and unstructured. Figure 12 
describes the classification of big space data. 


Due to the rapid proliferation of SBD, it is very difficult to collect, store, and analyze SBD using 
commonly available databases or data analysis applications. The differences in data formatting are an 
additional challenge to the processing, analysis, and distribution of SBD to end users. 
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Figure 12 - Classification of Big Data (ILNAS and ANEC, 2016). 


6.1.5. Functions 


In addition to storing SBD, repositories also provide metadata. Metadata is structured information 
about the data in a data archive that describe content, format, location, access privileges, and 
keywords. Metadata allows datasets to be screened and identified, as well as combined with other 
datasets when needed (NISO, 2004). 


Many data repositories also provide support tools to help the user easily find, process, or understand 
data. An example is virtual observatories (VO), web-based services for scientific research that combine 
different software and data archives (IVOA, 2016). For example, EuroVO (Euro-vo.org, 2016) is a 
European collection of applications that can be used for working with and visualizing data from sources 
such as the ESO. Another example is ESA Near Earth Objects (NEO) coordination center, which provides 
tools for astronomers (ESA SSA NEO, 2016). NASA also provides some tools for educational purposes 
(NASA Science, 2016a). If end users want to use data to produce more specific or specialized results, 
they may need to conduct their own processing. 
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6.1.6 Financing 


Data repositories are funded and developed by public money, private capital, or donations. Data 
repositories rent storage capability to customers and charge data search service fees to specific users. 


Building and maintaining a data repository website requires funding. This is in addition to the basic 
operational financial needs of each repository. These additional fixed costs are obtained either from 
the same financial source as the organization itself, such as NASA’s budget for maintaining its 
databases (NASA, 2016c)), or a secondary financing system, such as subscriptions to the International 
Astronomical Union (IAU) (IAU Central Bureau for Astronomical Telegrams, 2016a), or memberships 
fees, such as the Square Kilometre Array (SKA, 2016b). The financing system for these websites 
depends partly on the user market related to that specific database. For example, data used in 
scientific research will usually be available for free, but may require registration that includes details of 
the research institute and research purpose such as NOAA’s Comprehensive Large Array-data 
Stewardship System (CLASS) environmental data (NOAA, 2016) or the astronomical data at IAU. Using 
data for uses beyond scientific research usually require purchase of data from the provider. Some of 
these only require payment to cover the costs of maintenance. At the IAU, for example, users need to 
only purchase updated data (IAU Central Bureau for Astronomical Telegrams, 2016b). Some data 
repositories sell their products as part of their business plan, as is the case for all the private companies 
such as Planet (2016a), Terra Bella (2016) and Spire (2016), among others. 


6.2. Challenges for Data Repositories 


The challenges facing data repositories relate to both the space data and metadata. Data repositories 
face a sharply increasing volume of data, even beyond their own processing ability. This issue can be 
divided into five challenges: standardization, accessibility, organization, security, and financing. 


6.2.1. Standardization 


Lack of data standards impacts data repositories because there is no single way of storing incoming 
files and no way to collect files from different sources. 


In an attempt to improve exchange and support data synchronization, data pools are created to serve 
as a common communication point between trading partners. This leads to data silos, where fixed data 
is under the control of a single entity. This makes integration and access difficult. Figure 13 illustrates 
space data silos. Customers therefore rely on different access mechanisms depending on the data. The 
lack of common agreement is a tedious endeavor. While each silo establishes one part of the solution 
to a problem, a comprehensive solution requires a combination of data from silos. 
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Big Data Architecture and Infrastructure — Challenges (Data Stack) 
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Figure 13 - Challenges of Siloed Big Data Architecture and Data (Chang, W., 2016) 





Similar data types are stored in different file formats. Different datasets from different categories, such 
as NASA archives Earth Observing System Data and Information System (EOSDIS) and the Legacy 
Archive for Microwave Background Data Analysis (LAMBDA), have very different formats. Even 
organizations working within the same area, like NASA’s Minor Planet Center and ESA’s NEO 
coordination center, have different ways of storing and organizing similar data. 


Standardizing how data is named presents a challenge for archiving. Ideally, the name of a data file 
should contain all relevant information compiled in a clear way. One source of this problem is 
institutions prioritize relevant information differently, and have nothing but hindsight to determine 
what data may be important for the future (Jones, 2016). 


A simple example is light curve datasets that contain information about the signal intensity over a 
period of time. They are used to create plots and detect planets using the transit method. Users can 
download light curves from NASA’s exoplanet database (NASA, 2016a), and would find differences 
between data formats for light curves obtained from SuperWASP (Smith and WASP, 2014) and 
Convection Rotation and planetary Transits (NASA, 2016b). Output is in different formats with different 
metadata, and the data points report different flux measurements. A user who wants to compare light 
curves from the two datasets must convert one format into the other. Furthermore, the way to name 
the light curve files is different for the two databases: a nine-digit identification number versus a 
descriptive naming method that includes the location of the star contained in the file. 


Metadata is critical to data storage as it can describe useful features such as the specific location of 
information or the units. The American National Standards Institute (ANSI) and International 
Organization for Standardization (ISO) are leading metadata standardization efforts, but many different 
standards exist and the interoperability between different systems remain complicated (Bruce and 
Daniel, Year unknown). An illustration of this challenge is U.S. geospatial metadata relating to many 
data types including maps, GIS files, imagery, and other location-based data resources (FGDC, 2016). 
Historically, the U.S. used a national standard called the Content Standard for Digital Geospatial 
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Metadata (CSDGM), but with international standardization efforts, they have also adopted ISO 
standards for geospatial data. The two standards are similar but not identical, and are not written in 
the same format (FGDC, 2011). 


6.2.2. Accessibility 


Data repositories face challenges related to data hardware architecture. They must store and deliver 
data, and deliver updated data as soon as possible. Verifying and checking new data is a lengthy 
process, and may account for the potential reluctance to release it. This could lead to legal issues, such 
as property rights or security concerns as data acquisition, storage, archiving, and metadata are closely 
related to accessibility. 


Determining whether data should be open is a key challenge to accessibility. Today’s internet 
architecture could support widespread sharing, however some stakeholders support limited access. 
Scientists argue that open data would make science more efficient. However, the current publish-or- 
perish system, where scientists are pressured to be the first to publish, may argue against open data 
sharing. These constraints and different interests makes data accessibility controversial (WIRED, 2016). 


The willingness of repository owners to share their information depends on their own interests and 
restrictions. Some offer for-profit access, while others provide free access to the data and hope to 
benefit from the knowledge and results. Some provide tools to help process the data. For instance, ESA 
NEOs coordination center provides tools for astronomers (ESA SSA NEO , 2016), while NASA provides 
tools for educational purposes (NASA Science for educators, 2016a). Other owners choose to keep their 
data for themselves due to national security or their own commercial security. Some organizations only 
focus on scientific goals. An example is the NASA Earth Exchange (NEX, 2016b) platform for sharing the 
knowledge and enabling research collaboration for the Earth science community (Nex.nasa.gov, 2016). 
Other entities support public use as well as scientific purposes, like the NASA Space Science Data 
Coordinated Archive (NSSDC, 2016b). Others still intend to deliver Archival Information Packages (AIP) 
to the general educated public (NSSDC, 2016a). This data is meant to be used by customers who have 
the knowledge and expertise to work with the data. 


Even though a website is accessible to the general public, they generally require registration. European 
Geostationary Navigation Overlay Service is a good example: signal is free but the dataset requires 
registration to be accessed freely (EGNOS, 2016a). Registration allows identification of the client, and 
this data may be sold to third parties or identify what data is most valuable to the person (EGNOS, 
2016b). Other databases have citizenship restrictions. For instance, NASA Earth Exchange provides an 
access to a so-called open public access source but it is dependent on citizenship (NEX, 2016a). In the 
case of the Large Synoptic Survey Telescope, its primary purpose is to release information to the public 
but their archive states data will only be immediately available to astronomers from certain countries 
(LSST, 2016). 


Big data is often presented in a manner that users cannot understand such as diagrams, tables, and 
complex procedures. Decision makers who need the information contained in big data becomes 
frustrated because of the delay it takes to get proper interpretations and analysis of big data. Users 
expect to be able to access information themselves, but want it in a form that is easily understood (SAS, 
2013). The question remains: how can SBD be transferred into useful knowledge and public 
applications? Data repository organization is therefore another key consideration. 
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6.2.3. Storage Organization 


There is no single entry point or browsing tool for data repositories, exacerbating the huge challenge 
for data storage organization. Entities try to gather information with friendly interfaces and links to 
their multiple websites. NASA’s website states “[they’ve] created data.nasa.gov as a starting point to 
engage with our data, but this is simply a directory of all the wonderful data NASA makes available.” 
(openNASA, 2016). However, this is not common. No powerful and user-friendly tool such as Google is 
available to browse the internet for SBD. People have to know where to search and may not know that 
better data is available. 


The challenge for storage organization lies in archiving which data belongs to which category. Even if 
this problem is solved, there is a need for standardized archiving approaches. Every repository keeps 
data in some sort of structured archive. The ability to perform a large scale analysis on unstructured 
data is rare among Internet companies, such as Google, Facebook, or Amazon (Dale, 2016), but not yet 
for SBD. 


Indexing is a way for the user to search and find data in large datasets. However efficiently handling 
such amounts of data is a challenge and existing indexing technologies are not yet fully developed. 
With an increasing volume of data, indexing methods have to keep pace. Data users and the general 
public need to use big data, but it is not easy for them to find and use it. We need to find more 
effective methods so users can easily understand what the data is about and where they can use it. 


6.2.4. Security 


Data security encompasses many different features: data integrity and availability, hardware security 
and software reliability of the storage facility, and all the data processing operations. The threats to the 
security of data repositories mainly comes from two aspects: internal and external threats. There are 
three internal threats. First, there is a risk to hardware infrastructure if hard drives are damaged, 
resulting in data loss. Second, bugs can threaten software infrastructure, leading to data loss or 
corruption. Third, human error is a threat to accidental deletion. There are also four key external or 
environmental threats. First is wars and natural disasters, which could cause large-scale damage to 
data storage facilities. Second is interruption of power supply system for unexpected long time period. 
Third is super electromagnetic interference. Fourth is cyberattacks. 


6.2.5. Financing Issues 


The development, implementation, deployment, and maintenance of data repositories necessitate 
financial resources and planning. Funding sources available for data repository owners is often 
determined by data type, volume, computer and storage requirements, and target user’s profile. The 
issues that arise may be categorized as commercial and non-commercial. 


The downstream, commercial market for space data is undergoing rapid growth, with the advent of 
new satellite programs and constellations. An increasing number of private space companies create 
their own data repository based on internal strategies to store, archive, and disseminate data. 
Therefore, the data availability and reliability is totally dependent on the amount of money that the 
end users can pay to access it. Depending on its profile, the user will have limited or full access. Planet 
Lab (2016b) is one example of this growing market. Their financing model is based on a profit interest 
(Planet, 2016c). It can lead them to choose solutions like Amazon Web Service (2016), but it raises 
questions regarding the robustness and the security of the archives. 


The funding for non-commercial data repositories is currently focused on short-term missions and is 
subject to politics. The challenge here is how to alleviate the dependency on government funds 
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(Kitchin, Collins, Frost, 2015). Thus, there is a growing need to raise money from other sources, such as 
end users, and provide different access levels depending on their financial contribution. 


6.3. Conclusion 


Data repositories provide data storage and maintenance, as well as information about the data itself. 
We conclude SBD repositories are not fundamentally different from any other non-space sector in 
terms of functionality. However, space data repositories are unique in terms of accessibility and 
funding. 


With the emergence of SBD, challenges for data repositories include standardization, storage 
organization, accessibility, and financing issues. It is interesting to note that these challenges may 
likewise not be unique to the space industry; they also appear in other unique terrestrial industries, 
such as data regarding the high seas. 
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7 Core Challenges 


In the previous chapters, we reviewed the main stakeholders in the SBD value chain, as well as the 
challenges associated with each. We found that there were key themes regarding the challenges each 
stakeholder faced. Due to the broad nature of these themes, they also represent challenges spanning 
all of the different core disciplines. To effectively understand and resolve them, we must approach 
these from different angles. Thus, we break down the challenges into four sections; the market itself, 
engineering, standardization and interfacing, and openness versus privacy considerations. 


7.1. State of the SBD Market 


For a commercial entity using SBD, issues associated with growing businesses and profits stem from 
several core issues. These core issues include the high cost of data production, the fact that the space 
sector is a niche market, and the inability to generate meaningful results easily. While there is more to 
business than profit, it is generally true that profits will expand industry and encourage new companies 
or stakeholders to take part. Despite the commercialization of space big data increasing immensely in 
the past decade, it is in a constant state of evolution. One of the most interesting points about SBD is 
that it is not only a product on its own, but it can also grow through its application to other pre-existing 
fields. Although this is itself a challenge, it does reveal a gap between the current use of SBD and its 
potential. 


7.1.1. Costs and Barriers to Entry 


Each of the stakeholders in the SBD value chain is affected by the current high costs of space-based 
data systems. The advent of CubeSats and small satellites has reduced the costs in recent years, but the 
private industry has not yet hit critical mass: commercial interests have only recently replaced 
government stakeholders (Vecchi and Brennan, 2015). Despite commercial entities’ entrance into the 
NewSpace market, costs across the entire lifecycle of SBD remain relatively high. 


As government entities such as NASA were the main stakeholders prior to spaceflight 
commercialization, technologies used did not have the same development cycle as a commercial 
stakeholder would have used. Since commercial spaceflight firms use the same technologies today, the 
business models of the new private stakeholders are different than they might have been without 
private involvement. The average SpaceX Falcon 9 costs $60M (Grush, 2016) to produce, and these 
costs are eventually shared with the downstream consumers downstream to enable profitability. Even 
with the reduction in costs of the satellites themselves over the last few years, the costs associated 
with launching the satellite have not decreased greatly. 


7.1.2. Market Size 


One other method to reduce the cost of a product, space related or not, is to increase the quantity of 
the good produced (Investopedia, 2003). To ensure higher production of a good is financially viable, 
there must also be an increase in demand. In today’s space industry, satellite launches are still 
relatively few, and despite the new technological developments, there are few chances for scaling 
satellite production. 


Aside from the high costs of generating space data, applications that use SBD are becoming more 
common. Due to the high cost of manufacturing data and the aforementioned technical challenges of 
processing and locating data, effective application of data outside the initial satellite mission is very 
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difficult. The combination of low production due to a smaller market with high costs has created an 
industry that is difficult to survive. 


CASE STUDY: Chinese New Strategy for SBD Management 


China is an emerging economy impacting many global industries, and the Chinese space industry offers 
important lessons in space big data management. In recent years, China has evolved its strategies from 
segmented government institutions having control over individual datasets to a national data sharing 
mechanism that fosters economic growth. (Chinese Government, 2015). 


China has established several programs for civil applications such as oceanic, weather, and land 
satellites (CNSA, 2014a). The responsible agency decides mission requirements and daily satellite 
operations. To widen Chinese remote sensing applications and enhance the country’s remote sensing 
industry capabilities, the Chinese government developed the Chinese High Resolution Earth 
Observation System (CHEOS) in 2010. CHEOS will provide global application services in mapping, 
agriculture, disaster response, resource management, forestry, and environment management. This 
system includes space-based, near space, and airborne platforms as well as ground bases and 
application systems. Trials of fusing data collected from multiple platforms is underway, and expected 
to be fully active by 2020. 


More than 1800 companies with 413 special products use high resolution Earth observation satellites in 
18 industries across China (CNSA, 2014b). Data application centers built in 20 provinces of China take 
the responsibility for transferring earth observation data into information and knowledge, and 
providing services. They also act as a medium for establishing comprehensive applications with multi- 
source data for domestic and international user needs. More than 26 provinces including Beijing, 
Xinjiang, and Guangdong, have established a provincial data and application center based high 
resolution Earth observation data. As of now, they have delivered 239 special products and 131 service 
products (CNSA, 2014b). In 2003, CNSA and the Russian space agency Roscosmos considered 
exchanging data of similar types of satellites. In 2013, CNSA offered Pakistan satellite to support the 
rescue mission after the earthquake (CNSA, 2014a). 


CASE STUDY: Connecting the Public and Private Sectors 


The challenge of open access to government and commercial space missions lies in bridging the gap 
between bureaucratic agencies and proactive commercial sector. ESA provides open sourced data from 
the Copernicus program. With this action, the EU government showed it would prefer to see incubation 
centers work with the NewSpace sector. In addition, they would like to see other industries take more 
interest in space technology industry and understand the value of space missions. However, there is no 
progress in the awareness of subsidy needs for institutions processing space big data. Scientists, as well 
as potential development opportunities suffer as a result, while the commercial sector tries to fulfil the 
customers’ needs through other less equivalent, yet more reachable technologies, resulting in growth 
stimulation within these industries. The commercial sector is less interested in scientific data 
application in the space big data domain and as a result, scientists suffer greatly from the lack of 
applications. The value of scientific data is degrading, due to strong competition from all other 
application domains. This is a closely linked problem to the privacy vs. openness considerations below. 


At the BiDS 2016 conference in Tenerife, Spain, frustrations were expressed at the lack of skilled 
people, such as Data Scientists who understand the SBD acquisition process (Manieri, Costea and 
Florin, 2016). Agencies have pushed several EU universities to offer master degrees covering Data 
Scientists, and creating links with those universities with space incubators and commercial companies. 
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7.1.3. Concluding Remarks 


The SBD market is in a state of constant evolution. While cost, barriers to entry, and market size are 
primarily challenges associated with business and commercialization; profitability relies directly on the 
quality of the product being sold. To appropriately respond to this challenge, we must understand 
their relationships from the perspective of developing technologies and the political environment. 
Thus, for the SBD sector to develop, it needs to be looked at from the specific standpoint of SBD rather 
than understood as just another sector. Space involves many political considerations as space 
operations function at governmental and commercial levels and space operations take place in 
international territory, space. Data can be collected which may benefit or work to the detriment of the 
entire world. Thus, it is a politically sensitive sector. In addition, from a technological perspective, 
technology that functions on Earth to gather data is inherently different to technology that collects 
data from space and about space. This is key not only in terms of the technology needed, but also in 
terms of costs. Therefore, the market considerations for SBD must be assessed ina very specific 
manner. 


7.2 Engineering 


The critical challenges regarding the engineering aspects are briefed below. They include downloading, 
distribution and processing, and data mining and extraction. 


7.2.1. Data Download 


Data download is a main concern in LEO satellites due to the limited visibility from their ground stations 
and their limited bandwidth for downloading compared to the data generated per orbit. Most data is 
generated from the spacecraft, and the visible time window for a specific ground station is typically 
around ten minutes. A major constraint of mission planning is that data cannot be downloaded to the 
ground in real time, partly because ground station coverage is not sufficient. Most of the time, the 
ground antenna is waiting for the specific spacecraft to pass over. Ground station capabilities should be 
open resources that can be traded on a special downloading service market. The radiofrequency link 
budget, the supported time window, and modulations can be made public to the space operators to 
help download the mission data. Security issues may come up, but with modern encryption techniques, 
data integrity and privacy can be well protected. The software defined radio (SDR) techniques can be 
adopted to modify the ground station receivers and make them flexible to serve different spacecraft. 
Countries in the same international alliance, such as North Atlantic Treaty Organization (NATO), already 
have some cooperation, but world-wide sharing service may need more political and regulatory 
support. 


CASE STUDY: Data download and distribution requirements from the ISS 


ISS is an important program for Earth observation because it is a data bank for a large number of 
scientific studies. It provides a unique chance for both science and business communities to conduct 
scientific experiments and collect information. Data from ISS is currently downloaded either directly to 
the ground or through a Tracking and Data Relay Satellite System (TDRSS). Due to the steep rise in data 
handling requirements, the current internal communication network on the ISS may not be able to 
handle the data that is will be generated. The TDRSS system was designed for telemetry, tracking, and 
command for LEO satellites. Though the system has been working well, missions using its services and 
increased bandwidth requirements for individual missions saturate TDRSS. Its communication capacity 
is a maximum of five LEO satellites, and this capacity is too low to meet present requirements. 
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To fully realize potential of SBD, flexible communication capability must support higher data download 
rates and access. An efficient global data distribution network will meet this requirement. 


7.2.2. Data distribution 


Once ground stations download data, the main challenge is to distribute it among users. The users are 
scientists, researchers, or the general public. The challenge of a data dissemination system is to provide 
data to various users and bridge the gap between the providers and the potential users. 


7.2.3. Data Processing 


The main purpose of the computing and data processing system is to provide processed data to end 
users. Remote sensing applications are data intensive, requiring large amounts of data to be processed 
very quickly. Data processing is a real challenge with multi-spectral, high resolution, real time data in 
different formats and rates. The challenge in space big data is computation and storage of data. 


7.2.4. Advanced Space Data Mining 


Data mining is a key challenge of the big data industry. It bridges the meaningless data and various 
customers that need valuable information. Applications can only be developed with powerful 
methodologies to bring revenues to the investors. In Earth observation or reconnaissance missions for 
example, the sophisticated and diverse image analysis is still heavily relying on experienced human 
technicians. A breakthrough in advanced mining methods will be the catalyst for the SBD industry. 


CASE STUDY: Google based data mining 


Google, the very successful search engine, deals with a huge amount of data, but would it be possible 
to use similar technologies for mining space big data? The Google search engine is mainly based on text 
information in websites. Text processing is a relatively fast and mature technology, but most search 
engines use metadata for images browsing. Image metadata can be generated automatically by tags or 
contextual hints. Higher level image search engines are provided by content based image retrieval 
(CBIR) systems. State-of-the-art commercial CBIR systems such as TinEye can find similar images to an 
input reference, but computer vision or visual perception of images is still an obstacle for computers. 
Space raw data usually comes in the form of digital signals, images, or videos, and is very rarely in text. 
A semantic searching engine cannot be used in most cases, and Earth observation data is made of 
images with limited metadata from ground. In this way, traditional data mining techniques are not 
efficient to solve the space big data problem. 


7.2.5. Concluding Remarks 


While technology is the foundation of the space industry, it also represents major challenges in its 
development and growth. Technology is constantly evolving, and as a program with many devoted 
scientists and engineers, these two disciplines take the forefront in its resolution. 


7.3. Interoperability 


Interoperability is a focal point of big data conversations, and the challenge extends to SBD. 
Interoperability connects to challenges from other categories, but must be addressed on its own. 
Standardization across the value chain represents a challenge often discussed, but not yet resolved 
completely. Standardization is required to define, interoperate, share, transform, and manage data. It 
offers many advantages to improve the work quality of SBD and save resources. 
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Standardization is important to maintain commonality between different industrial sectors; to have 
clarity between government and industry; to avoid the vendor locking problem (customer forced to buy 
the same product from the same vendor due to proprietary issue); to ensure data quality; to retrieve all 
the relevant data by the user. Essentially, developing standards is the basis for an open and successful 
big data market. 


CASE STUDY: Standardization Requirements 


Increase in Earth observation data leads to the need of providing data quickly. Since data fusion has 
become a necessity for many applications, policy makers, scientists, and industry representatives want 
to make Earth observation data accessible to a wide user community. This interoperability can only be 
achieved by standardization of the data. Currently, individual data architectures are unable to 
accommodate the exponentially increasing space data. Analysts spend 80-90% of their time in 
preparing the data, locating data from various files, databases, changing formats, linking data, filtering, 
instead of analyzing it (Brown, 2015). Data quality is affected by the lack of standardization that has 
negative effects on businesses. Standardization has a major impact on productivity and profit. 


7.3.1. How to implement Standardization 


Implementing and developing technical standards to maximize compatibility, interoperability, 
repeatability, and quality require universal framework. Standards bodies develop, coordinate, 
promulgate, and generate standards, and include the International Organization for standardization 
(ISO), the International Electro-technical Commission (IEC), and the International Telecommunication 
Union (ITU). Other bodies deal with specific standards such as the Institute of Electrical and Electronics 
Engineers (IEEE), the World Wide Web Consortium (W3C), the Open Geospatial Consortium (OGC), the 
Organization for the Advancement of Structured Information Standards. Those bodies may be used to 
implement universal standardization. 


Thus, standardization is not only a technical challenge, but also requires legal and policy actions to 
support or enforce use of certain standards. Standards should comprise of technology standards such as, 
network and communication protocols, and data aggregation standards and regulatory standards 
including security and privacy of data. 


7.3.2. Metadata Standardization 


Metadata provides descriptive information about a dataset, object, or resource, including how it is 
formatted and when the data is collected. To promote interoperability, metadata needs to be 
standardized. Metadata standardization allows integration of multiple data sources and maintenance 
of data quality, which in turn addresses veracity and value of data. It enables access to different data 
available in different platform and helps achieve business and consumer goals. Finally, it helps with 
networking across multiple systems in multiple domain and platforms, and simplifies and reduces the 
cost of data exchange (ISO and IEC, 2015). 


7.3.3. Concluding Remarks 


Standardization plays a pivotal role in the SBD value chain to provide customers with the ability to 
select comparable and compatible data from multiple producers. 


Standardization has to be the common language among all the shareholders of SBD. Without it, SBD 
cannot be shared effectively and further processed in an active data value chain. Standardization can 
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improve the work quality on SBD and spare resources. Stakeholders should pay careful attention to 
reach a broad agreement on a standardization system dealing with all the aspects of the value chain. 


7.4. Privacy vs. Openness 


Openness and privacy are related challenges in big data. The need to promote liberal data use must be 
tempered with enabling entities to keep data private. This issue has become more apparent due to the 
transition from government centric SBD to a balance of publicly and privately generated SBD. The 
commercial world has a big interest now in SBD for profitability, and may not welcome openness as 
easily as a government agency. 


7.4.1. Openness 


Openness is an essential element of big data uses and interface. The easier it is to access information, 
the more likely it gets to use it and connect it with other sources to produce solutions. Data availability 
has been a concern for government, scientists, and businesses alike. That said, it also presents a 
challenge in relation to humanities because different cultures and countries have different values and 
ethics. Navigating this difficult challenge absolutely requires input from the different disciplines. 


Former chief scientist of Amazon Andreas Weigend bluntly stated: "Data is the new oil.” (Mid Market 
Pulse, 2016) Luo Rui Lan, IBM Chief Executive Officer said: "All data will be among the industry to 
determine the winner of the fundamental factors, the final data will become human vital natural 
resources” (Mid Market Pulse, 2016). Unlike oil, though, data is reusable and sharable. Sharing data 
does not mean dispensing data, but rather producing more or more complex ones. As a result, the 
more open big data is, the more use can come out of it, and by extension the more value it can 
generate. Nevertheless, sharing data also means sharing control of this data. Once information is 
released, its flow depends on the users that have access to it. This means that any information can be 
revealed without permission. This raises concerns when the circulating information is private, meaning 
subject to the permission of its original owner or generator. 


As far as SBD is concerned, there are several main motivations for data sharing. The first motivation 
comes from space law and policy. According to Article One of the Outer Space Treaty (UNOOSA, 2016), 
“..the exploration and use of outer space shall be carried out for the benefit and in the interests of all 
countries.” Data obtained from or about space enhances our knowledge of space, assure the safety of 
the Earth, share insight about the origin and fate of the cosmos, or help us search for another 
habitation. The second motivation is associated with the high cost of space activities. For example, 
NASA’s space budget was up to USD $17.6 billion in 2014 (Earth Space News, 2014). In consideration of 
the extremely high investment, NASA has joined with international partners to cooperate in space 
activities. Through this method, each partner funds its respective contributions and the obtained data 
is also shared with all the involved partners. The third motivation comes from the technology 
requirement. Space activities involve complex tasks that are based on the accurate knowledge of space. 
To ensure accuracy, data from different sources, different levels, and different formats should be 
integrated together to produce a comprehensive result. This integration requires the data owners to 
share data, such as asteroid monitoring with each other, where data is combined from many sources of 
data, like space-based telescopes, amateur astronomer’s terrestrial telescopes, and observatories. 


Open data can improve accountability but it can also increase liability. There is acknowledgement that 


data privacy and security is a problem that can affect everyday life. Information is flowing everywhere 
and space data has many applications that can be applied to many aspects of everyday activity, such as 
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navigation and connectivity. We are not yet able to identify the challenge in its true dimension, and it 
will probably get more complex, especially as individuals globally rely more on their digital identities. 
The balance between data liberalization and protection of privacy is the next challenge for legislators. 


The key challenge regarding data openness is how it can be technically achieved and how we can 
convince the various stakeholders to share their data. For instance, the freedom of information acts 
requires governments to provide information regarding their activities (Open government guide, 2016b). 
In the US the freedom of information, as described in the FOI Act (US code, 1967) mandates the partial 
or full disclosure of private governmental documents, subject to a few limitations (US code, 1967). 


7.4.2. Privacy 


Privacy has two elements with regard to SBD. First, the obligations on entities to ensure they comply 
with the appropriate privacy regulations to enable end users and the public's privacy rights. For 
instance, where there is a need to log into a service, the entity must ensure that the log in details 
comply with privacy law. Likewise, companies must ensure that when they are collecting the space 
data, such as via remote sensing, they comply with privacy laws. Second, the right of the entity to keep 
their own data private. For instance, when companies gather or process this data, entities have a right 
to keep it confidential. 


Both are affected if there is an obligation for data openness. Privacy is recognized internationally as a 
fundamental human right and is crystallized as a basic principles in the major international agreements, 
the Universal Declaration of Human Rights, the International Covenant on Civil and Political Right, and 
the constitutions of more than 100 countries across the globe (Open government guide, 2016a). 
Protection of privacy is also part of domestic laws, such as the EU Privacy Protection Regulation of 2016 
and the US Privacy act of 1974 (Public Law, 1974). The difficulty is that every single country has 
different privacy laws. However, this is the same challenge that faces every single sector across every 
area of law. This means that an entity that requires ‘data openness’ will not be able to oblige entities to 
share their data unless there is a policy to support it. It also means that end users and the general 
public are still entitled to their privacy rights. We have a comprehensive system of privacy law; it does 
not run contrary to data openness, it simply requires that the relevant data used is proportionate. 


The only issue is that if there is a law that requires data to be open, there must be a legitimate need for 
this. Regarding entities needing to satisfy end-users privacy rights, this is not really an issue, the 
challenge is actually that many companies are not aware of their legal obligations towards end users 
and the general public. 


CASE STUDY: Google Street View 


“Today's satellite-image technology means that...complete privacy does not exist”(Musil, S. 2009). 
Without Google Earth and similar applications, people around the world would not have the access to 
such huge satellite imagery data. As far as data openness is concerned, Google Earth is undoubtedly a 
breakthrough. Google Earth together with the related services such as Google Maps and Google Street 
View, makes the tremendously huge and complicated world transparent to every single person, which 
was unimaginable before. A person in the corner of the world can see any part of the world vividly as if 
he was just staying above that place, without knowing which satellite the image comes from and how 
the data is processed. 
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Complaints about Google are focused on its Street View. Actually, Google Street View is supplemented 
by remote sensing, global positioning systems, and other space big data. The privacy issue of Google 
Street View is essentially the one of space big data. Since the resolution of remote sensing satellites is 
becoming higher, people may have an issue with high resolution remote sensing imagery in the coming 
years. For example, as mentioned earlier, high-resolution of less than 0.1 m remoting sensing satellite 
may soon become a reality. Regardless, Google Earth and Google street view collects personal 
information. Personal information is information that can be related to any person in any way; if you 
see a red car outside someone's house, this is personal data as via the internet the address of the 
house can be identified and linked to an individual. 


Privacy law functions well. The courts are a method of enforcing legal rights. Google has been fined due 
to Google Street view issues throughout Europe and in Canada, and as a solution Google blurs out the 
relevant data at issue. It means even Google has troubled understanding their legal obligations. That 
sharing certain data, like license plates is disproportionate, and not necessary for the purposes of 
Google Earth. The key is proportionality. If companies have a legitimate purpose and follow appropriate 
legal safeguards, the data may be shared. The courts are there to act as a safety net, to ensure 
companies are complying with their legal obligations. The key challenge here is that there is no legal 
obligation to make data open. 


7.4.3. Concluding Remarks 


The right to privacy and the right to information create a balance between confidentiality and 
openness. 


It must be remembered that end-users and the public privacy rights must be respected. SBD does not 
only refer to data itself, but also includes the hidden information we may extract from the huge pile of 
data from different sources using powerful computation and intelligent methods. As a result, in some 
scenarios raw data could be of low information value, but the combination through SBD processing 
could result in personal data. Companies must consider personal data regulations, which essentially 
require that if personal data is collected it must be proportionate and provides the subjects of the data 
to certain rights. While this is not an issue for larger companies, it may be difficult for startups with 
limited financial resources to fulfill their privacy obligations. It is also an issue for larger companies as 
the law itself is ambiguous. The courts will provide a solution and define legal obligations. 


Due to characteristics and huge values of SBD, there are many motivations to open access the SBD. 
Data owners and the data users are both looking for effective ways to open and share to maximize the 
use and boom the related information markets. Openness is limited by certain political considerations, 
such as national security. Solutions to data openness and privacy problems come from three aspects 
including legislation and policy, technical means, ethics, and self-discipline. There is no legal obligation 
to make data open, thus how this may be done and enforceability considerations are challenges. 


7.5. Conclusion 


Due to state of the art technologies and growing trends in the market, SBD keeps evolving and the 
challenges are growing. The challenges of SBD are categorized into four areas: markets, engineering, 
interoperability, and openness versus privacy. The first challenge in the market is to understand the 
business, how to commercialize the data, and comply with the existing legal framework. Handling data 
in real time, data downloading, processing, distribution, and storage are the major critical issues 
regarding the technological aspects. Interoperability and access to true quality data requires 
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standardization of data and has to be resolved in the future for enhancing the business outreach. 
Openness and privacy are related and have to be handled with utmost care. The challenge is to provide 
a tradeoff between them to have effective use of the SBD. We shall detail potential roadmaps to deal 
with these challenges in the following chapter. 
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8. Recommendations and Roadmap for Space Big Data 


A key objective for this project is to produce actionable recommendations that will assist decision 
makers and influence the future direction of SBD. We produced a roadmap for handling the challenges 
proposed in the previous section to take the SBD industry from its current state to the desired future 
state. In this section four roadmaps are presented, each corresponding to one of the four global space 
big data challenges we have identified above: market, engineering, standardization and interfacing, and 
privacy. 


8.1. Roadmap to handle the SBD Market 


This roadmap aims to enable an SBD Ecosystem, similar to the roadmap for Europe that Becker, et al. 
(2016) proposed for Big Data as a method to aid with the market challenges. Within this lie 
considerations relating to technology, interoperability, and privacy that are further outlined below. This 
chapter describes briefly the solutions, with focus on the parameters to track implementation and how 
to get from the current state to the proposed future state. 


For a commercial entity using SBD, the issues associated with growing businesses and profits are 
multiple. They stem from the core issues of high cost of data production, a niche market, and the 
inability to easily generate meaningful results. To best expand and support the development of the 
commercial space and to build a successful SBD ecosystem, we must address each of those. This can be 
done from different perspectives, mainly related to business, engineering, and policy. Interestingly, 
these solutions are all interrelated. It means that by implementing one solution, the other two will 
naturally follow. Therefore, they can be performed in any order or priority. 


8.1.1. Business support 


From a business point of view, the way to effectively expand the space big data market should begin by 
analyzing the industries with potential opportunities for profit. While the current state of open data 
does create an environment conducive to new applications of space data, there must be a shift in the 
perception of space data. Because most developments of space big data technology have been the 
result of government missions, there is often no specific focus on creating profitable and flexible 
technologies. 


Government and large industry should normalize the use of space big data in its decision-making 
process. Government agencies should utilize space big data to offer businesses the chance to provide 
services. This influx of business opportunities from more public-private partnerships will encourage 
new players to enter the market at all levels in the space big data value chain. With the increase in 
profitability, there will be greater competition between different companies, leading to the need to 
differentiate. This can either be done via price or by quality (Joseph, 2016). Because of the high costs 
in the space data manufacturer area, quality of delivery and technology are the best opportunities for 
new players that will stimulate the market. 


8.1.2. Technology support 


As the practical use of space big data is rapidly expanding and has proved its ability to positively impact 
many other industries, governments should seek to employ space big data in a larger number of 
projects. For example, investing in SBD technologies that will be applicable to improve government 
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works, such as infrastructure and city planning, will allow experts to focus on SBD specific technologies 
rather than gleaning technology that could be applicable to SBD from other industries. 


Government investment in SBD such as in Al, machine learning, satellite capabilities, data interfaces, 
data storage, data application, and data downloading will lead to a sharp increase in profitability as it 
comes to fruition. Additionally, the increase in government funding will encourage new players to join 
the field and create an ecosystem of competition and technological improvement. 


8.1.3. Policy support 


If the political arena surrounding space big data is not conducive to thriving and expanding businesses, 
then the entire industry’s growth is in jeopardy. 


While NASA has created an open inventory of its space big data for public use, policies surrounding the 
definition of private and public data are not clear. Asa result, there are additional risks in using certain 
types of data for commercial benefit. Supporting a very clear policy toward what data should be public 
or private will help to alleviate those risks. SBD need legal framework that reduces the risk to new 
players entering the market. 


Creating this legal framework will encourage commercial expansion in the SBD application market. It 
will provide help to overcome the barriers and simplify legal considerations when generating and 
disseminating raw and processed data. It could also serve to overcome certain challenges to 
international SBD openness and sharing problems that will be discussed in a following chapter. 


8.2. Roadmap to handle SBD Engineering Challenges 


8.2.1. Data download 


The downloading challenge mainly concerns LEO data, but there are three different axis of research 
that can be examined. 


First is to reduce the volume of the data at the optimized level needed. New onboard pre-processing 
techniques can reduce the sampling rate and filter noises, while various compression algorithms can 
maximize the download link efficiency. The state-of-the-art compressed sensing technique shows 
potential to further reduce the sampling rate while keeping all the relevant information. A large 
number of Earth observation and space exploration satellites or ground stations are generating a large 
amount of space data. In the process of downloading data from space based assets, there are two main 
issues: scale-up processors and Field Programmable Gate Arrays (FPGA) are preferred. FPGA is usually 
configured to do simple but heavy computing tasks, while the scale-up processors are used to 
accomplish the more sophisticated algorithms. 


Second is to significantly improve the downloading capacity. With the latest Forward Error Correction 
(FEC) and digital modulation techniques, the download link has reached performances very close to the 
its theoretical limit. According to Shannon’s theory, much wider bandwidth is desirable to expand the 
download capacity. Further exploitation of the potential advanced in radiofrequency techniques such 
as gallium nitride power amplifiers is expected in the near future, and higher V band (40-75 GHz) can 
provide higher bandwidth as soon as the devices and technologies are mature. Moving towards optical 
frequencies allows a very high data rate link. Though the physics is different from radio communication, 
laser communication is still promising high-speed for the near future and demonstrated its potential in 
the European Relay Satellite System (EDRS). 
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Third is to extend the download time window. An existing solution is the Data Relay Satellite System 
(DRSS), which provides LEO-GEO-ground station link for LEO data download. With existing multi-access 
DRSS only provides s-band download capability for five LEO satellites. Ground based active antenna 
systems offer a potential solution, but the implementation issues such as the atmosphere calibration 
makes it very difficult and expensive. 


Another possible solution for continuous and multiple accessibility is a commercial sharing operation of 
all the Earth ground stations. Ground operators should share their download link and start a data 
downloading service industry. Government regulates commercial downloading service. Ground stations 
across the world can share technique specifications, including radiofrequency link budget, supported 
bands, and modulations. Technically, modifications of the ground stations, especially the software 
upgrade to support missions required from others will be necessary. Sharing can happen at different 
levels. Optimistically, in five years, ground stations will operate fully in 24 hours instead of waiting for 
the spacecraft to pass over. 


8.2.2. Data Mining Via Al 


Modern artificial intelligence (Al) technologies have progressed significantly over the last decade and 
many companies are currently investing both time and money in Al capabilities (Qmohundro, 2015). 
This can be used as a Solution to the data mining challenges identified. Unlike text based internet, SBD 
is in the form of images, videos, and digitalized signals. These are harder to analyze and extract 
valuable information. “This problem is known as the semantic gap and is defined as the lack of 
concordance between low-level information, automatically extracted from the images, and high-level 
information, analyzed by human beings” (Gangarski, 2014b). 


The advances on artificial intelligence have shown the possibility to greatly improve the performance of 
visual perception. The AlphaGo program uses sophisticated hybrid neural networks and shows great 
potential on deep learning (DL) techniques. DL is used to understand remote sensing imagery and 
extract high level information. Within the field of SBD, Orbital Insight can be regarded as a pioneer. The 
openness of SBD is likely to greatly accelerate the development and maturity of Al in the near future. A 
general framework of deep learning for RS data analysis is as follows. 
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Figure 14 - A General Framework of Deep Learning for RS Data Analysis 


The objectives of the DL network are dependent on the applications. With a huge amount of proper 
data input into the network, it adapts the ability to understand the complex works that are difficult for 
traditional domain knowledge and algorithms. 


The DL for space data mining is still a new area. But it has already shown great potential. For example, 
researchers from Stanford have used it to predict poverty (Jean et al., 2016). As the theory and 
application of DL continues to develop in many other areas, we believe that it may be the next 
milestone to bridge the semantic gap and brings a substantial change in the space data mining 
problems. 


Governments, industries, and academia should promote research and application of Al in the SBD 
industry. Governments and industries can help academia develop algorithms for SBD mining by 
sponsoring education and fundamental research on combining Al and SBD, for example Al onboard a 
satellite. Investing in academia could result in advanced technologies to solve practical problems and 
propose new space big data applications that could result in new business models and revenue. 


Al use in SBD can be tracked by its usage and popularity. Therefore, the progress and success of the 
proposed solution and its implementation could be tracked by checking the revenues of companies 
using Al for SBD mining and government support for the companies and academia. However, the 
development and application of Al arguably evolves gradually, rather than exponentially like data. With 
respect to Al, there are a number of significant barriers to implementation, arguably even longer. 
Technology-wise, DL techniques require huge computational ability and domain knowledge, and needs 
more investment in research and development. Non-technical barriers include privacy and security 
challenges. The access of space big data is another obstacle for the prevalence of the Al technology. At 
the current rate and expected Al development, reliance on modern technology such as Al is likely still 
about ten years into the future. 
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8.2.3. Data Distribution via Global Data Dissemination system 


The solution for the data distribution challenges identified above is to have a global data dissemination 
network among different regions of the world. It will allow every region to share the distribution of the 
raw data and also locally share resources for processing the data and extract the useful information. 
This global distribution can be accomplished through a constellation of GEO/MEO satellites, which will 
act as an interregional backhaul network (Eumetsat, 2016b). High throughput multi-beam satellite 
systems will be a good contender. This information will be distributed amongst the users in the region 
through optical network or through digital broadcast via satellite (Eumetsat, 2016b). A generic proposal 
for a global distribution is shown in Figure 15. 
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Figure 15 - Global Data Distribution Network 


The distribution network can be broken down into three main components: the primary and secondary 
network, and the computing and data processing system. 


The primary network component mainly deals with the distribution of the data being downloaded from 
different satellites either directly or indirectly through Data Relay Satellite Systems. The main purpose 
of this network is to transmit and receive data across different widely separated regions of the globe. A 
GEO satellite would be the most effective solution to provide connectivity between separate places. 
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With a constellation of three GEO satellites, it is possible to almost cover the entire globe. The satellites 
in the GEO constellation will be linked to allow for intercommunication, which will simplify the ground 
infrastructure needed to facilitate it. However, this will pose a challenge for configuring high capacity 
inter-satellite links, for which optical communication can be a potential solution. Additionally, every 
LEO satellite will need a communication system to be able to upload its data to the GEO satellite. The 
primary nodes could be connected with the interface ground station and after a first level processing, 
the data could be distributed to scientists or handed over to high performance computing system for 
more complicate processing. 


There are two types of processing systems, one with high level processing capabilities that are 
computationally intensive and the other with high level data storage and transfer capabilities that are 
data intensive. Remote sensing applications are data intensive for there is a large amount of data to be 
processed as soon as possible. However, to get a high level data, a large amount of data processing is 
still required. One solution could be a cluster of multiple computers (scale-up) collaborating for the data 
processing task while offering a single system solution. Google has attempted this for their intensive 
computation task and has built a large cluster system consisting of nearly 1500 ordinary personal 
computers. However, this cluster computing system doesn’t present a very optimal solution for remote 
sensing big data processing. 


Regarding hardware, petascale supercomputers have become the primary platforms used in the 
scientific community. These systems are evolving forward. However, they are “not good at loading, 
transferring and processing extremely large volume of data” (Ma et al. 2015). Thus, new systems are 
required to have a “higher dimensional connection topology and multi-level storage architecture as 
well” (Ma et al. 2015). The Gordon system (Caulfield, Grupp, Swanson, 2010) is one good example of a 
new system designed for data-centric applications. “However, for performance efficiency it is critically 
important to take data locality into account” (Ma et al. 2015). 


One other solution consists of using computing infrastructures connected via a cloud, a virtualized 
computer but with flexible amount of processors, memory, and even disk size. Cloud computing is not 
only distributed computing but also encompasses distributed storage and distributed caching. Cloud 
computing not only delivers application and software services, but also extends the infrastructure and 
platform support service. Another advantage of utilizing the cloud is the added capability of storing big 
data with effective scalability potential (Ma et al., 2015). 
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Figure 16 - Distributed Architecture Based on Clouds (Chen, 2015) 


One of the most important examples of cloud system for big data computing is the Apache Hadoop. 
Hadoop enables distributed, data-intensive, and parallel applications. Hadoop works on the concept of 
the Hadoop Distributed File System (HDFS), which is a large distributed file system based on strategic 
layouts and data replication for fault tolerance and better analyzing performance. Recently, Yahoo has 
decided to convert its computation process to use a Hadoop cluster. Facebook and eBay also use the 
HDFS to develop its large applications representing exabytes of data. In addition, the Hadoop-GIS 
system, mainly used for intensive spatial data processing, search and access is also built upon the 
Hadoop system (Ma et al., 2015). 


The secondary network is responsible of the regional distribution of the data for the last step of 
connectivity to the end users. We have identified two different ways to distribute that desired data. 


Our first solution is to utilize the existing optical fiber terrestrial network infrastructure. The data will 
be distributed through internet nodes to users who will gain access through a common website. There 
is no particular challenge in in this as it represents the application of the current system. Another way 
of distributing the data is through a satellite broadcast approach similar to a direct to home service. 
Users can receive the data by installing antennae similar to the one used for satellite TV and accessing 
satellite-based internet. The core challenge is distributed multi-dimensional, including international 
relations, policy, and technological challenges. The recommendations for each are discussed below. 


International Relations: As the network has to be operated across and accessed from different parts of 
the world, there should be an international consortium of the participating countries. There should be 
two levels of cooperation, one at the global level of the communication network infrastructure, and the 
other at the regional level for the processing, a cloud computing system, for example. 


Business: As the network is very expansive, countries must encourage cooperation with private 
industry as a public-private partnership project in which private companies will have rights to use data 
to a greater extent. 
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International Policy: As some space big data, such as satellite imagery, is dual-use, an international 
data distribution policy must be established to define and restrict illegal or data use. 


Technology Advancements: Many challenges are involved in having a global data dissemination 
system that includes having high capacity geostationary satellites to connect the regions. New data 
processing systems with advanced hardware and new algorithms must be developed. 


What should be done: The government and the owners of the ground infrastructures should allow for 
sharing. Then the stations share their downlinks for profit. 


CASE STUDY: WIS 


An example of an integrated approach to collecting and distributing SBD information and knowledge is 
the Information system WIS from the World Meteorological Organization (WMO). WIS serves for 
sharing weather, climate, water, and related environmental data produced and used by the member 
organizations. 


The European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) is an 
intergovernmental organization collecting satellite and non-satellite data, producing meteorological 
products, and delivering data and products according to the meteorological requirements of its 
member states. It is also responsible for provision of data distribution via direct dissemination (direct 
from the Weather satellites) or through the EUMETCAST primary dissemination system (distribution 
though broadcast of GEO satellites). Global distribution is carried out using the WMO Information 
system (Eumetsat, 2016b). 


8.3. Roadmap to handle Space Big Data Interoperability Challenges 


The key solution for standardization and interfacing challenges is to create a common core data access 
platform via an interoperability mechanism. 


8.3.1. Standardization and interfacing via interoperability 


Interoperability in SBD means standardization of the data and the metadata, and creation of a common 
interface, which can be achieved with the support of governments and policy makers. Their help is 
required to unite SBD users and leaders around the world and convince them to reach that common 
goal. The actual standardization can be done either through an international standardization 
organization or a broad consortium. The consortium should include a technical committee, specialized 
subcommittees, and working subgroups to set up common good practices, with which all would comply 
and provide input for their own interest. Eventually, achieving standardization will allow people to 
imagine the desired broader solution: the creation of a common core database similar to the one 
envisioned by the Open Data Access Platform Association (Olavsrud, 2016). The goal is to provide a 
common practical interface that will help stakeholders derive value out of more available data ina 
comprehensive way. 


Our vision is to overcome the identified challenges and constraints and standardize data by 2020. The 
advantage of this process is that it will increase application usage, which will lead to increased social 
outreach and economic growth. By tracking these indicators, we will be able to globally monitor the 
completion of space big data standardization. 
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We must consider how this method could actually be implemented. We highly recommend creation of 
a special desk or administrative entity for space data standardization at the International Organization 
for Standardization (ISO) or the International Telecommunication Union (ITU) to help foster the 
international framework for the cooperation of all the users playing a role in the SBD value chain. This 
desk will facilitate the implementation and improve interoperable standards as SBD keeps on 
increasing in astronomical proportions. All technical terms relating to SBD across various software 
architecture domains should complement IUPAC (International Union of Pure and Applied Chemistry) 
and SI units (System International) so as to have the same terminology globally. 


A big data reference architecture has already been crafted by ISO and IEC (2015). There should be 
continental sub-reference architectures specifically for space data by 2018, which could be networked 
globally by 2020. Infrastructures for uplink and downlink of space data must be upgraded by then to 
facilitate faster transfer of SBD using the new engineering solutions suggested in the previous chapter. 


Considering people and countries’ reluctance to share data, whether because of intellectual property 
concerns, national security, or financial interest, one solution would be for space data storage networks 
to be delineated from all other network system such as the Internet. It is recommended that standards 
developed for the World Wide Web (www) be adapted to the SBD network so it could then be settled 
at a security level suitable for all the involved stakeholders. Unfortunately, this separate network will 
prevent some end users from having a direct internet access to the database. This problem could be 
fixed by using security bridges, which will eventually filter the data and provide access depending on 
user profiles. Figure 17 below illustrates a possible space big data interface architecture. 
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Figure 17 - The Architecture of a Big Data Portal (Kuo, 2016) 


In the future, as described in the SBD ecosystem, research and development should be sourced from 
governments of countries with space agencies (e.g., 1% of their GDPs) or with a financial formula 
similar to that of the United Nations. Research findings and innovations pertaining to SBD storage and 
software architecture infrastructure expansion could then become the property of the global space 
community to ensure that the ever increasing volume of space data is able to be effectively monitored 
and handled. 
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Due to the large challenges facing data manufacturers and repositories, it will also be important to 
recognize the developments and improvements provided by data manufacturing and storage software 
and hardware engineers. A reward system for data owners across the value chain for data use would 
bolster openness by data owners. 


Public institutions’ data usage should be transparent to the data owners and society to ensure 
confidence and proper management of space data. There should be appropriate legal instruments that 
could be put into place through the ISO by 2018. This would deal with the psychological barriers to 
concerted efforts in achieving global standardization. The right policy framework for space big data 
protection has to be settled too, in light of globalization and to prevent any mischief from unscrupulous 
actors. 


Even though some standards are available, there are still gaps remaining to be filled. The following gaps 
have to be addressed in the future for the complete interoperability of space big data: 


Data security and private access controls 

Data sharing and exchange 

Data storage 

Interface between relational and non-relational data stores 
Synchronization of data across distributed computed environment 
Open source / open data platform 


8.3.2. Concluding Remarks 


Standardization and interfacing is a major challenge that can be a major leverage to bring 
interoperability in space big data and improve the production of valuable products and services. As 
described earlier on, standardization and interfacing represent a bottleneck that, if resolved, would 
have a chain reaction, improving every stakeholder’s processes. 


A standardized common database and its supporting interface will help to support the growth of space 
big data. To extract additional value, a comprehensive method is needed to involve each actor and 
focus their efforts in common direction. 


8.4. Roadmap to Handle Privacy vs. Openness 


Many individuals are under the impression that national and international law prevents data openness 
or, conversely, that law is the solution. In fact, neither is the case. Openness is more a technical, policy, 
and business issue. Businesses have a desire to keep data confidential to maximize profits. It is a 
technical challenge to make the data accessible. Many countries hold policies that are against data 
openness because of national security issues. The current legal regime and court system in place does 
allow for the balancing of privacy vs. openness. Just because something includes personal data does 
not mean it cannot be used, it essentially needs to be proportionate. Where the balance lies with 
regard to what is proportionate is a matter of policy. For instance, Israel does not allow high-resolution 
remote sensing images of its country to be shared. The law does not prevent this, but the security 
status of the country does. Thus, as a primary step, there needs to be greater awareness as to the 
applicable laws relevant for each company and situation. The commonality among all legal systems 
regarding privacy is that data collection should be proportionate to national security concerns and the 
protection of personal data (DLA PIPER, 2016).Thus, regardless of whether the data is open, basic 
privacy and data protection laws still need to be met. The question that remains is: To what extent 
should we force stakeholders to make their data open? There is no legal requirement for that. 


63 


Our recommendation from the previous chapter is to build an international common database that will 
be shared among all the stakeholders. We realize that this may be a very complex and challenging 
solution because many entities want to capitalize on their data. Another option is to encourage self- 
regulation. This gives the power to the industry itself to come together and make solutions that will be 
best for all. The solutions should provide an appropriate balance between what data is open, and what 
remains confidential. 


It should be remembered at this stage that every nation has its own legal regime on privacy and sharing 
data. This is a comprehensive system of laws. An alternative solution is self-regulation to enable 
companies to govern themselves and decide what and how data can and should be made open. This 
leads to the clear question of ethics and cultural awareness with regard to companies methodologies 
and priorities. 


8.4.1. Common shared database 


For a common shared database to function, every entity involved must be convinced of such a need 
and be able to see the advantages. As explained above, openness will promote their data, and may 
enlarge their market whether they sell the data or not. From the open data point of view, it will also 
increase the benefit of space data for humankind. A shared database will provide the added benefit of 
interoperability. The common shared database will even enable specific considerations such as national 
security interests. The challenge is to guarantee that the data will be sorted out very carefully and 
stamped in such a manner that every stakeholder will be ensured that the data is handled according to 
individual preferences and the relevant national regulations. Data shall be sorted into the following 
three categories or more: 


° Public data expected to be open and freely accessible 
) Data related to national security concern 
e Private data that is to be monetized 


Indeed, data that comes out of publically funded systems will more likely be shared broadly. One 
example would be the astronomical data. On the other hand, private companies will want to sell every 
dataset for profit. Overcoming every player’s interest, national security concerns will prevail and 
restrain people from sharing sensitive information. Access to this database will be provided by an 
interface that will offer a large variety of user profiles. These user profiles shall be very specific and are 
the primary requirement for securing the database. It should ensure in a totally controlled way how 
and who has access to what. All of this has to be taken into consideration as technical primary 
constraints that should be addressed with a certain level of flexibility to allow countries to have control 
on their data. This is a technical challenge that can be fixed with robust algorithms to gather the data or 
solid database architecture. 


Using a common interface to this database will allow companies or other entities to provide processing 
tools, applications, and services online. This type of website could use a bridge if needed to access a 
separate network, and would offer the opportunity to promote any kind of valuable use of the SBD. 
Still, considering that it would be wise to have a regulatory body controlling the whereabouts of the 
data, we recommend that the Committee on the Peaceful Uses of Outer Space (COPUOS) be assigned 
to this mission. The mission will have to be updated by consensus, which is compatible with the 
requirements of every nation. A subcommittee should be dedicated to the management of the 
interface and common shared database, which should comply with international standards in the 
meantime. 
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The legal framework necessary to implement the above openness plan in terms of end user privacy 
considerations is already available, and is not far from the one that is used around the Internet and 
similar to the one used by Google. Even if a separate network is built to provide more secure archives 
and to release every player from the fear of any risks, privacy laws already exist within the legal 
framework of every country. These laws already cover this kind of situation since this is basic legal 
principle (DLA PIPER, 2016). Legally, there will need to be a new law that forces companies to place 
their data on this database. Such a law may face privacy challenges; however, it is hoped that the 
courts would judge the proportionality requirement in favor of the common shared database. A time 
frame should not be an issue, as the major constraint on time would come from technical issues in 
compiling all of the data in a standardized format. A potentially larger issue would come from a policy 
perspective - to convince all of the stakeholders and nations that this is what is best to support SBD 
growth and extracting more and more value from it. 


8.4.2. Self-Regulation 


Self-regulation is a popular idea among many industries; that is, to allow the industry to regulate itself, 
to discuss the best option together regarding to what extent data should be made open. Notions of 
ethics could govern this, especially in the case of crisis management data or data for educational 
purposes, such as celestial data. For the data that is not to be open, the companies should identify how 
they can communicate and promote the fact that such data exists and how people could access it. This 
would enable a cohesive approach to the problem where scientists and academics would be able to 
either find the data or know who to ask. 


8.5. Conclusion 


Four roadmaps are outlined above, detailing the solutions that reflect the four prime challenges. The 
roadmaps should be acted upon as soon as possible to optimally deal with the challenges. It is 
important to note that all of the solutions are interrelated and interdisciplinary, spreading from 
engineering to law. Technology can be implemented via political and legal aspects to provide better 
results for SBD end users and customers. This will in turn increase business profit. 
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9. Conclusion 


To define space big data, map the activities of stakeholders, identify challenges they face, 
suggest potential solutions, and outline recommendations to promote growth of the space 
big data sector. 


We began this report with the words ‘space’, ‘big’ and ‘data’ and distilled the above mission statement. 
Now we have a report that has indeed defined space big data, mapped the activities of stakeholders, 
identified the challenges and proposed solutions and recommendations for creating value and 
overcoming the barriers related to the continuing growth of the SBD sector. 


This analysis of the SBD sector comes at a time when the space industry is realizing the importance of 
utilizing their big data abilities and while the generation of space data is growing rapidly. There is a 
need at the moment to understand SBD, who the players are and how it can be better harnessed and 
utilized. 


SBD has moved from a government-centric industry towards a more commercial approach. End users 
and customers are becoming the starting block and motivation for future developments. In addition to 
end users, commercial actions rely on technological development and political attitudes toward SBD. 


As a whole, the SBD industry is still undeveloped, relying primarily on technologies and marketability 
from ancillary industries. That said, there is a clear trend towards increased reliance on data from space 
in general, and there is evidence for the exponential growth in the generation of big data coming from 
the space sector. After thorough research and analysis, our SBD team identified clear challenges and 
barriers to enable SBD to achieve its true potential. It is our hope that this will eventually lead to the 
normalization of SBD in both government and commercial settings. Furthermore, once the challenges 
are overcome, this will enable new opportunities for the space sector and allow it to develop in ways 
that are currently not possible. 


One of the main takeaways from our research is that, if developed and utilized properly, SBD has the 
potential to completely change not only the space industry, but also the industry as a whole. For it to 
benefit all industries, SBD should be brought to the forefront of scientific industry as a primary tool for 
development. For this to be possible, there must be a realization that SBD is a highly applicable field, 
and that there must be actions taken to move away from traditional approaches to data dissemination 
and usage. SBD actually has already created completely novel opportunities, risks, and approaches to 
business and security. But people should acknowledge that it is only at the beginning of its potential 
capabilities. 


Our work on SBD lead us to another surprising conclusion, which is that space big data is still largely 
misunderstood. When big data was still emerging as a concept, the debate was over what constituted 
the shift from data to big data. It was a technical limitation that was overcome to continue to process 
data and extract value. This report shows that a similar shift is now taking place in the emerging SBD 
domain. The SBD industry is at a turning point, with players like Orbital Insight Inc. leading the way. On 
the road ahead, to maximize the potential from SBD, it is extremely important that the SBD industry’s 
players work together. Now is the time to act. Coordinated efforts will ensure the SBD industry 
continues to grow rapidly while being utilized effectively for the benefit of the space industry and the 
world as a whole. 
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